Issues fitting a toy Lomax model example

I have a pretty simple example that doesn’t seem to work. My goal is to build a Lomax model, and since PyMC3 doesn’t have a Lomax distribution I use the fact that an Exponential mixed with a Gamma is a Lomax (see here):

import pymc3 as pm
from scipy.stats import lomax

# Generate artificial data with a shape and scale parameterization
data = lomax.rvs(c=2.5, scale=3, size=1000)

# if t ~ Exponential(lamda) and lamda ~ Gamma(shape, rate), then t ~ Lomax(shape, rate)
with pm.Model() as hierarchical:
    shape = pm.Uniform('shape', 0, 10)
    rate = pm.Uniform('rate', 0 , 10)
    lamda = pm.Gamma('lamda', alpha=shape, beta=rate)
    t = pm.Exponential('t', lam=lamda, observed=data)
    trace = pm.sample(1000, tune=1000)

The summary is:

>>> pm.summary(trace)
           mean        sd  mc_error   hpd_2.5  hpd_97.5   n_eff      Rhat
shape  4.259874  2.069418  0.060947  0.560821  8.281654  1121.0  1.001785
rate   6.532874  2.399463  0.068837  2.126299  9.998271  1045.0  1.000764
lamda  0.513459  0.015924  0.000472  0.483754  0.545652  1096.0  0.999662

I would expect the shape and rate estimates to be close to 2.5 and 3 respectively. I tried various non-informative priors for shape and rate, including pm.HalfFlat() and pm.Uniform(0, 100) but both resulted in worse fits. Any ideas?

Scipy parameterization is sometimes different than the PyMC3 parameterization. For example. st.gamma.logpdf(x, a, scale=1.0/b) in PyMC3 is pm.Gamma.dist(a, b).logp(x). But in this case the parameterizaiton seems to be correct (if what wikipedia said is correct).

I cannot quite find out why this parameterization doesnt work, but in this case it might be easier to port the scipy version into pymc3:

import theano.tensor as tt

def lomax_logp(value, alpha, lam):
    x = value/lam
    _logpdf = tt.log(alpha) - (alpha+1)*tt.log1p(x)
    return _logpdf - tt.log(lam)

with pm.Model() as m:
    # using Flat as a demo, do not do this if you are sampling.
    shape = pm.HalfFlat('shape')
    scale = pm.HalfFlat('scale')
    pm.DensityDist('obs', lomax_logp, 
    map1 = pm.find_MAP()
{'scale': array(3.00431805),
 'scale_log__': array(1.10005061),
 'shape': array(2.59470935),
 'shape_log__': array(0.95347451)}

Thank you for the reply! I actually discovered my mistake: to derive a Lomax as an exponential mixed with a gamma, I need to fit a lambda for each observation, i.e. lamda = pm.Gamma('lamda', alpha=shape, beta=rate, *shape=len(data)*)

Once I added this, the estimates for shape and rate were what I expected. But thank you for the custom distribution example, that’s very helpful.

Follow up question: Why shouldn’t Flat be used when sampling?

Couple of reasons:

  1. It’s not generative, meaning that you cannot generate data from a Flat distribution
  2. It’s unrealistic - Flat prior means you are assuming the parameter can take infinite value.
  3. It’s computationally difficult - it put too much weight in the area with little information, where usually display funnel-like behaviour.

More information could be found in the Stan wiki Prior Choice Recommendations, also the recent paper
The prior can generally only be understood in the context of the likelihood

Thank you, that’s helpful. Sorry, one last question that has come up as I’ve built out this model.

I’m conducting survival analysis, and as such I have censored data. I read this quick example of consored data in the pymc3 docs, and want to do something similar:

def lomax_logp(t, complete_observation, shape, rate):
    scale = 1/rate
    if complete_observation:
        logp = shape * (tt.log(scale) - tt.log(scale + t)) + tt.log(shape) - tt.log(scale + t)
        logp = shape * (tt.log(scale) - tt.log(scale + t)) # 1 - CDF
    return logp

I ended up using slightly different notation than you did above, but should be the same maths. Then I ran this model:

with pm.Model():
    shape = pm.HalfFlat('shape')
    rate = pm.HalfFlat('rate')
    pm.DensityDist('obs', lomax_logp, observed=dict(t=data_lomax, complete_observation=complete_observations, shape=shape, rate=rate))
    map = pm.find_MAP()

Where complete_observations is an np array of booleans. It isn’t working though because complete_observations is a TensorConstant in lomax_logp. How can I modify this to make it work?

I saw this example in the pymc3 source code, but the approach above seems more readable if we could get it working. Thank you again!

I assume that complete_observations is supposed to be a vector of 0 and 1 to indicate censor or not?
If that’s the case, try something like this:

return shape * (tt.log(scale) - tt.log(scale + t)) + complete_observation*(tt.log(shape) - tt.log(scale + t))

PS, are you sure the tt.log(shape) is correct here? if there are censor value shouldnt you only counting the noncensored ones?

Ah i see, complete_observations needs to be included in the theano statement, thanks. And I actually want to include the censored examples in the likelihood since they’ll inform the distribution of survival, if that’s what you’re asking?

Unrelated: is it possible to coerce scipy functions to valid theano objects? For instance I’m trying to use scipy’s betainc (regularized incomplete beta function) for the cdf of a beta distribution, but getting

TypeError: ufunc 'betainc' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Actually think I may have solved this?

from scipy.special import betainc as regularized_beta
from theano.compile.ops import as_op
@as_op(itypes=[tt.dscalar, tt.dscalar, tt.dvector], otypes=[tt.dvector])
def reg_beta(a, b, x):
    return regularized_beta(a, b, x)

Using this log probability:

def gamma_gamma_logp(t, complete, shape, rate_shape, rate_rate):
    x = t / (t + rate_rate)
    return complete * (rate_shape * tt.log(rate_rate) + (shape - 1) * tt.log(t) - log_beta(shape, rate_shape) - (shape + rate_shape) * tt.log(rate_rate + t)) + (1 - complete) * tt.log1p(reg_beta(shape, rate_shape, x))

find_MAP() works, though it gives

Warning: gradient not available.(E.g. vars contains discrete variables). MAP estimates may not be accurate for the default parameters. Defaulting to non-gradient minimization 'Powell'.

so I’ll need to investigate that a bit

Yes this is a standard way to wrap scipy/numpy function into theano. But as you can see, wrapping it does not provide gradient, so in general we recommend rewriting the function in theano so you can take advantage of modern samplers (NUTS).

Yep, think I’m starting to get the hang of this :). Thank you again for all the help.

By the way, is there any plan to incorporate pymc with Spark once the switch to TFP happens in v4?

Not currently - but hopefully there are some infrastructure of tensorflow+spark by then that we can borrow.