Issues fitting a toy Lomax model example

nickresnick · May 22, 2018, 10:35pm

I have a pretty simple example that doesn’t seem to work. My goal is to build a Lomax model, and since PyMC3 doesn’t have a Lomax distribution I use the fact that an Exponential mixed with a Gamma is a Lomax (see here):

import pymc3 as pm
from scipy.stats import lomax

# Generate artificial data with a shape and scale parameterization
data = lomax.rvs(c=2.5, scale=3, size=1000)

# if t ~ Exponential(lamda) and lamda ~ Gamma(shape, rate), then t ~ Lomax(shape, rate)
with pm.Model() as hierarchical:
    shape = pm.Uniform('shape', 0, 10)
    rate = pm.Uniform('rate', 0 , 10)
    lamda = pm.Gamma('lamda', alpha=shape, beta=rate)
    t = pm.Exponential('t', lam=lamda, observed=data)
    trace = pm.sample(1000, tune=1000)

The summary is:

>>> pm.summary(trace)
           mean        sd  mc_error   hpd_2.5  hpd_97.5   n_eff      Rhat
shape  4.259874  2.069418  0.060947  0.560821  8.281654  1121.0  1.001785
rate   6.532874  2.399463  0.068837  2.126299  9.998271  1045.0  1.000764
lamda  0.513459  0.015924  0.000472  0.483754  0.545652  1096.0  0.999662

I would expect the shape and rate estimates to be close to 2.5 and 3 respectively. I tried various non-informative priors for shape and rate, including pm.HalfFlat() and pm.Uniform(0, 100) but both resulted in worse fits. Any ideas?

junpenglao · May 23, 2018, 6:01am

Scipy parameterization is sometimes different than the PyMC3 parameterization. For example. st.gamma.logpdf(x, a, scale=1.0/b) in PyMC3 is pm.Gamma.dist(a, b).logp(x). But in this case the parameterizaiton seems to be correct (if what wikipedia said is correct).

I cannot quite find out why this parameterization doesnt work, but in this case it might be easier to port the scipy version into pymc3:

import theano.tensor as tt

def lomax_logp(value, alpha, lam):
    x = value/lam
    _logpdf = tt.log(alpha) - (alpha+1)*tt.log1p(x)
    return _logpdf - tt.log(lam)

with pm.Model() as m:
    # using Flat as a demo, do not do this if you are sampling.
    shape = pm.HalfFlat('shape')
    scale = pm.HalfFlat('scale')
    pm.DensityDist('obs', lomax_logp, 
                   observed=dict(value=data,
                                alpha=shape,
                                lam=scale))
    map1 = pm.find_MAP()
map1

{'scale': array(3.00431805),
 'scale_log__': array(1.10005061),
 'shape': array(2.59470935),
 'shape_log__': array(0.95347451)}

nickresnick · May 24, 2018, 2:17am

Thank you for the reply! I actually discovered my mistake: to derive a Lomax as an exponential mixed with a gamma, I need to fit a lambda for each observation, i.e. lamda = pm.Gamma('lamda', alpha=shape, beta=rate, *shape=len(data)*)

Once I added this, the estimates for shape and rate were what I expected. But thank you for the custom distribution example, that’s very helpful.

Follow up question: Why shouldn’t Flat be used when sampling?

junpenglao · May 24, 2018, 6:05am

Couple of reasons:

It’s not generative, meaning that you cannot generate data from a Flat distribution
It’s unrealistic - Flat prior means you are assuming the parameter can take infinite value.
It’s computationally difficult - it put too much weight in the area with little information, where usually display funnel-like behaviour.

More information could be found in the Stan wiki Prior Choice Recommendations, also the recent paper
The prior can generally only be understood in the context of the likelihood

nickresnick · May 24, 2018, 8:06pm

Thank you, that’s helpful. Sorry, one last question that has come up as I’ve built out this model.

I’m conducting survival analysis, and as such I have censored data. I read this quick example of consored data in the pymc3 docs, and want to do something similar:

def lomax_logp(t, complete_observation, shape, rate):
    scale = 1/rate
    if complete_observation:
        logp = shape * (tt.log(scale) - tt.log(scale + t)) + tt.log(shape) - tt.log(scale + t)
    else:
        logp = shape * (tt.log(scale) - tt.log(scale + t)) # 1 - CDF
    return logp

I ended up using slightly different notation than you did above, but should be the same maths. Then I ran this model:

with pm.Model():
    shape = pm.HalfFlat('shape')
    rate = pm.HalfFlat('rate')
    pm.DensityDist('obs', lomax_logp, observed=dict(t=data_lomax, complete_observation=complete_observations, shape=shape, rate=rate))
    map = pm.find_MAP()

Where complete_observations is an np array of booleans. It isn’t working though because complete_observations is a TensorConstant in lomax_logp. How can I modify this to make it work?

I saw this example in the pymc3 source code, but the approach above seems more readable if we could get it working. Thank you again!

junpenglao · May 24, 2018, 8:45pm

I assume that complete_observations is supposed to be a vector of 0 and 1 to indicate censor or not?
If that’s the case, try something like this:

return shape * (tt.log(scale) - tt.log(scale + t)) + complete_observation*(tt.log(shape) - tt.log(scale + t))

PS, are you sure the tt.log(shape) is correct here? if there are censor value shouldnt you only counting the noncensored ones?

nickresnick · May 25, 2018, 3:57pm

Ah i see, complete_observations needs to be included in the theano statement, thanks. And I actually want to include the censored examples in the likelihood since they’ll inform the distribution of survival, if that’s what you’re asking?

Unrelated: is it possible to coerce scipy functions to valid theano objects? For instance I’m trying to use scipy’s betainc (regularized incomplete beta function) for the cdf of a beta distribution, but getting

TypeError: ufunc 'betainc' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

nickresnick · May 25, 2018, 4:22pm

Actually think I may have solved this?

from scipy.special import betainc as regularized_beta
from theano.compile.ops import as_op
@as_op(itypes=[tt.dscalar, tt.dscalar, tt.dvector], otypes=[tt.dvector])
def reg_beta(a, b, x):
    return regularized_beta(a, b, x)

Using this log probability:

def gamma_gamma_logp(t, complete, shape, rate_shape, rate_rate):
    x = t / (t + rate_rate)
    return complete * (rate_shape * tt.log(rate_rate) + (shape - 1) * tt.log(t) - log_beta(shape, rate_shape) - (shape + rate_shape) * tt.log(rate_rate + t)) + (1 - complete) * tt.log1p(reg_beta(shape, rate_shape, x))

find_MAP() works, though it gives

Warning: gradient not available.(E.g. vars contains discrete variables). MAP estimates may not be accurate for the default parameters. Defaulting to non-gradient minimization 'Powell'.

so I’ll need to investigate that a bit

junpenglao · May 25, 2018, 4:27pm

Yes this is a standard way to wrap scipy/numpy function into theano. But as you can see, wrapping it does not provide gradient, so in general we recommend rewriting the function in theano so you can take advantage of modern samplers (NUTS).

nickresnick · May 25, 2018, 4:52pm

Yep, think I’m starting to get the hang of this :). Thank you again for all the help.

By the way, is there any plan to incorporate pymc with Spark once the switch to TFP happens in v4?

junpenglao · May 25, 2018, 5:45pm

Not currently - but hopefully there are some infrastructure of tensorflow+spark by then that we can borrow.

Topic		Replies	Views
Help with fitting Gamma Distribution Questions	4	5623	February 1, 2019
Python crashes when trying to sample model with covariates Questions	2	1363	February 10, 2021
Using Gamma distribution parameters from SciPy Questions	4	1460	September 28, 2020
Confusion on the use of shape parameter in multinomial likelihood v3 modeling	4	388	July 21, 2023
Hierarchical modeling with a black box likelihood function Questions	2	895	April 3, 2019

Issues fitting a toy Lomax model example

Related topics