Custom Weibull distribution

Hi.
I’d like to use a customized Weibull distribution for the likelihood function which is not that different from its original from.
The general form of Weibull distribution is expressed as follow:
https://docs.pymc.io/api/distributions/continuous.html#pymc3.distributions.continuous.Weibull

f(x|\alpha,\beta)=\frac{\alpha}{\beta}(\frac{x}{\beta})^{\alpha-1}\mathrm{exp}(-(\frac{x}{\beta})^\alpha)

A custom Weibull distribution I want to use has just one more location parameter l as follows:

f(x|\alpha,\beta,l)=\frac{\alpha}{\beta}(\frac{x-l}{\beta})^{\alpha-1}\mathrm{exp}(-(\frac{x-l}{\beta})^\alpha)

Here, all the parameters \alpha, \beta, l have Gamma distribution as their priors.
The shape and scale parameters of Gamma are defined in a deterministic way.
In this case, what is the best way to model this?
I know how to use as_op operator but I’d rather not to use it because I can’t use some gradient based sampler.
I’ve checked there is pm.DensityDist to define a custom distribution but it’s capability seems somewhat limited and I couldn’t find a comprehensive guide to use pm.DensityDist.
If the custom Weibull I mentioned above can be modeled by pm.DensityDist, would you please give me some guide to define this?
If it is not possible to model using pm.DensityDist then other advice would be appreciated very much.

For DensityDist you just need a theano-implemented version of the log-likelihood. A random sampler is helpful, but you can cheat a little bit and have NUTS draw samples for you if you want some data:

def custom_weibull(x, a, b, l):
    return tt.log(a) - tt.log(b) + (a - 1) * (tt.log(x - l) - tt.log(b)) - tt.pow((x-l)/b, a)

with pm.Model() as PriorModel:
    a_true = pm.Constant('at', 3.)
    b_true = pm.Constant('bt', 12.)
    l_true = pm.Constant('lt', 1.)
    x_look = pm.Deterministic('xl', l_true + pm.HalfNormal('xc', 10.))
    cweibull = pm.DensityDist('cWeibull', logp=custom_weibull, observed={'a': a_true, 'b': b_true, 'l': l_true, 'x': x_look})
    res = pm.sample(1000, nuts_kwargs={'target_accept': 0.9, 'max_treedepth': 25})
    
pm.traceplot(res);

xl here holds an (approximate) CustomWeibull distribution (if you want it to be exact, use a massive improper uniform density; but HalfNormal(10) is pretty close), so we can take some data and use it for inference

obs_data = res['xl'][500:700]
with pm.Model() as inference_model:
    a_pr = pm.HalfNormal('a_pr', 2.)
    b_pr = pm.HalfNormal('b_pr', 2.)
    l_pr = pm.HalfNormal('l_pr', 2.)
    cweibull = pm.DensityDist('cWeibull', logp=custom_weibull, observed={'a': a_pr, 'b':  b_pr, 'l': l_pr, 'x': theano.shared(obs_data)})
    res_inf = pm.sample(1000, nuts_kwargs={'target_accept': 0.9, 'max_treedepth': 25})

pm.traceplot(res_inf)

Now there are lots of divergences using this parameterization; and I’m sure they all come from

tt.log(x-l)

as l can be sampled so that it exceeds some observed values. It might be a good idea to re-parameterize so that you can sample from a positive variable “x_minus_l”, and not have to worry about invalid log-likelihoods.

Good luck!

1 Like

By the way, it looks like you’re using a shifted version of the Weibull. I think the right way of doing thi sis actually to use the Transform API. Shockingly, translation and scaling are not default transforms (?? @junpenglao) but it’s easy to implement:

class Translate(pm.Transform):
  def __init__(self, v):
    self.v_ = v

  def forward(self, x):
    return x + self.v_

  def forward_val(self, x):
    return x + self.v_

  def backward(self, z):
    return x - self.v_

  def jacobian_det(self, x):
    return 0. * x  # log(1) = 0

  def apply(self, dist):
    return TransformedDistribution.dist(dist, self)

with pm.Model() as model:
  l = pm.Gamma('l_pr', 3.)
  a = pm.Gamma('a_pr', 3.)
  b = pm.Gamma('b_pr', 3.)
  shift = Translate(l)
  weib = shift.apply(pm.Weibull('base', a, b))

Now you even have access to the distribution’s logp function and RNG!

1 Like

Oh yeah an affine transformation is pretty straightforward and we should definitively add that.

This is a really nice way of using it and somehow it never come across to me! You should send a Pull request and add this to the doc.

2 Likes

Thank you very much @chartl.
Your guide code definitely works but I have a question.
Actually, most of the examples I could find were pm.DensityDist as a prior.
The one reason why I get puzzled using pm.DensityDist as a likelihood function is that the code looks like handling all the prior as a part of observation data which is quite different from a general modeling in pyMC3.
If there is a shifted Weibull distribution implemented in pyMC3, then the likelihood function would be like:

likelihood = cWeibull('obs', alpha=a, beta=b, loc=l, observed=obs_Data)

The code you kindly presented for me is like:

likelihood = pm.DensityDist('cWeibull', logp=custom_weibull, observed={'a': a_pr, 'b':  b_pr, 'l': l_pr, 'x': theano.shared(obs_data)})

The only variable that contains observed data is “obs_data”, but the likelihood using pm.DensityDist looks like the priors are also a part of observed dictionary.
Actually, either way the result I get is almost the same which I am quite surprised.

For your informaton, I defined the custom Weibull written above as follows before you helped me how to use pm.DensityDist:

from pymc3.theanof import floatX
from pymc3.distributions import transforms
from pymc3.util import get_variable_name
from pymc3.distributions.dist_math import (
alltrue_elemwise, betaln, bound, gammaln, i0e, logpow,
normal_lccdf, normal_lcdf, SplineWrapper, std_cdf)

from pymc3.distributions.distribution import (Continuous, draw_values, generate_samples)

class cWeibull(pm.Continuous):

    def __init__(self, alpha, beta, loc, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.alpha = alpha = tt.as_tensor_variable(floatX(alpha))
        self.beta = beta = tt.as_tensor_variable(floatX(beta))
        self.loc = loc = tt.as_tensor_variable(floatX(loc))

    def logp(self, value):

        alpha = self.alpha
        beta = self.beta
        loc = self.loc
     
        return tt.switch(tt.le(value - loc, 0.0),-np.inf,
                         bound( tt.log(alpha) - tt.log(beta) + (alpha - 1) * tt.log((value - loc) / beta) - ((value - loc) / beta)**alpha,
                         value >= 0, alpha > 0, beta > 0, loc >= 0))

I’d like to know why I could get the same result, and whether there is no problem defining a custom likelihood function in that way (using pm.DensityDist).

Additionally, you mentioned a positive variable “x_minus_l”.
Actually, you exactly see the painful part I am suffering from.
Because I am dealing with the wind speed, to prevent a divergence of log likelihood to infinite which is caused by a negative x-l value, I defined the custom Weibull using tt.switch and bound in the code above.
Is there a smarter way to prevent the divergence of log likelihood?

Thanks a lot.

I agree. It would be nice to have different kwargs in DensityDist, like priors= and observed=. In practice (@ferrine or @junpenglao can correct this if it is incorrect) the logic inside DensityDist is: If it’s a pm.Distribution object, it’s coming from the prior (and should be sampled), while if it’s a theano.Shared object, it represents observations.

I’m not quite sure what you’re asking. If the logps are the same, the samples should be the same…

Since you require that x-l > 0 why not do something like

min_obs_x = np.min(x)
BoundedGamma = pm.Bound(pm.Gamma, lower=min_obs_x)
l_prior = BoundedGamma('l_prior', **params)

The other parameterization (that I was talking about) would abandon the gamma prior on l, and instead do something like

min_obs_x = np.min(x)
x_minus_l = pm.HalfNormal('delta_x_l', 10.)
l = pm.Deterministic('l_prior', min_obs_x - x_minus_l)
1 Like

Thank you for your valuable advice @chartl.
Another question I have is that the use of theano.Shared .

while if it’s a theano.Shared object, it represents observations.

I have searched a lot for understanding the meaning of theano.Shared variable but couldn’t understand it in the context of pyMC3 model definition.
That is, how pyMC3 recognize theano.Shared variable as observations when I use pm.DensityDist.
Moreover, even though I didn’t write the likelihood function without theano.Shared, the sampling itself works perfectly and the results are the same. For example:

# without using theano.Shared
likelihood1 = pm.DensityDist('cWeibull', logp=custom_weibull, observed={'a': a_pr, 'b':  b_pr, 'l': l_pr, 'x': obs_data})

# using theano.Shared
likelihood2 = pm.DensityDist('cWeibull', logp=custom_weibull, observed={'a': a_pr, 'b':  b_pr, 'l': l_pr, 'x': theano.Shared(obs_data)})

Even if I used the likelihood1, I got the same posteriors.
I really want to know how come I obtained the same results.
Where can I find some document to fully understand theano.Shared in the context of pyMC3 model definition?

Thank you in advance.

I would recommend going through http://deeplearning.net/software/theano/tutorial/adding.html to get a sense of theano variables (the same kinds of principles will apply to tensorflow and pymc4).

In brief, theano variables are not assigned a value until you pass them in explicitly. shared objects are the exception to this rule, and are instantiated with values at the outset.

The reason you obtain the same results is far simpler: if you pass in a numpy array as observed data, pymc3 applies theano.shared as a convenience.

1 Like

This option seems very nice to me, however when running the following code:
import pymc3 as pm

class Translate(pm.distributions.transforms.Transform):
  def __init__(self, v):
    self.v_ = v

  def forward(self, x):
    return x + self.v_

  def forward_val(self, x):
    return x + self.v_

  def backward(self, x):
    return x - self.v_

  def jacobian_det(self, x):
    return 0. * x  # log(1) = 0

  def apply(self, dist):
    return pm.distributions.transforms.TransformedDistribution.dist(dist, self)

with pm.Model() as model:
      l = pm.Gamma('l_pr', alpha=3, beta=3)
      a = pm.Gamma('a', alpha=3, beta=3)
      b = pm.Gamma('b', alpha=3, beta=3)
      shift = Translate(l)
      weib = shift.apply(pm.Weibull('base', alpha=a, beta=b))
      trace = pm.sample(100)

I get the following error:

   Traceback (most recent call last):
  File "C:/Users/xxx/source/xxx/run_scripts/xxx/test.py", line 28, in <module>
    weib = shift.apply(pm.Weibull('base', alpha=a, beta=b))
  File "C:/Users/xxx/source/xxx/run_scripts/xxx/test.py", line 21, in apply
    return pm.distributions.transforms.TransformedDistribution.dist(dist, self)
  File "C:\Users\xxx\AppData\Local\Continuum\anaconda3\envs\xxx\lib\site-packages\pymc3\distributions\distribution.py", line 52, in dist
    dist.__init__(*args, **kwargs)
  File "C:\Users\xxx\AppData\Local\Continuum\anaconda3\envs\xxx\lib\site-packages\pymc3\distributions\transforms.py", line 120, in __init__
    testval = forward(dist.default())
AttributeError: 'TransformedRV' object has no attribute 'default'

Thanks!

I fixed it by chagning the apply function into:

  def apply(self, rv):
    return pm.distributions.transforms.TransformedDistribution.dist(rv.distribution, self)