I have a pretty simple example that doesn’t seem to work. My goal is to build a Lomax model, and since PyMC3 doesn’t have a Lomax distribution I use the fact that an Exponential mixed with a Gamma is a Lomax (see here):
import pymc3 as pm
from scipy.stats import lomax
# Generate artificial data with a shape and scale parameterization
data = lomax.rvs(c=2.5, scale=3, size=1000)
# if t ~ Exponential(lamda) and lamda ~ Gamma(shape, rate), then t ~ Lomax(shape, rate)
with pm.Model() as hierarchical:
shape = pm.Uniform('shape', 0, 10)
rate = pm.Uniform('rate', 0 , 10)
lamda = pm.Gamma('lamda', alpha=shape, beta=rate)
t = pm.Exponential('t', lam=lamda, observed=data)
trace = pm.sample(1000, tune=1000)
The summary is:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
shape 4.259874 2.069418 0.060947 0.560821 8.281654 1121.0 1.001785
rate 6.532874 2.399463 0.068837 2.126299 9.998271 1045.0 1.000764
lamda 0.513459 0.015924 0.000472 0.483754 0.545652 1096.0 0.999662
I would expect the shape and rate estimates to be close to 2.5 and 3 respectively. I tried various non-informative priors for shape and rate, including
pm.Uniform(0, 100) but both resulted in worse fits. Any ideas?
Scipy parameterization is sometimes different than the PyMC3 parameterization. For example.
st.gamma.logpdf(x, a, scale=1.0/b) in PyMC3 is
pm.Gamma.dist(a, b).logp(x). But in this case the parameterizaiton seems to be correct (if what wikipedia said is correct).
I cannot quite find out why this parameterization doesnt work, but in this case it might be easier to port the scipy version into pymc3:
import theano.tensor as tt
def lomax_logp(value, alpha, lam):
x = value/lam
_logpdf = tt.log(alpha) - (alpha+1)*tt.log1p(x)
return _logpdf - tt.log(lam)
with pm.Model() as m:
# using Flat as a demo, do not do this if you are sampling.
shape = pm.HalfFlat('shape')
scale = pm.HalfFlat('scale')
map1 = pm.find_MAP()
Thank you for the reply! I actually discovered my mistake: to derive a Lomax as an exponential mixed with a gamma, I need to fit a lambda for each observation, i.e.
lamda = pm.Gamma('lamda', alpha=shape, beta=rate, *shape=len(data)*)
Once I added this, the estimates for
rate were what I expected. But thank you for the custom distribution example, that’s very helpful.
Follow up question: Why shouldn’t Flat be used when sampling?
Couple of reasons:
- It’s not generative, meaning that you cannot generate data from a Flat distribution
- It’s unrealistic - Flat prior means you are assuming the parameter can take infinite value.
- It’s computationally difficult - it put too much weight in the area with little information, where usually display funnel-like behaviour.
More information could be found in the Stan wiki Prior Choice Recommendations, also the recent paper
The prior can generally only be understood in the context of the likelihood
Thank you, that’s helpful. Sorry, one last question that has come up as I’ve built out this model.
I’m conducting survival analysis, and as such I have censored data. I read this quick example of consored data in the pymc3 docs, and want to do something similar:
def lomax_logp(t, complete_observation, shape, rate):
scale = 1/rate
logp = shape * (tt.log(scale) - tt.log(scale + t)) + tt.log(shape) - tt.log(scale + t)
logp = shape * (tt.log(scale) - tt.log(scale + t)) # 1 - CDF
I ended up using slightly different notation than you did above, but should be the same maths. Then I ran this model:
shape = pm.HalfFlat('shape')
rate = pm.HalfFlat('rate')
pm.DensityDist('obs', lomax_logp, observed=dict(t=data_lomax, complete_observation=complete_observations, shape=shape, rate=rate))
map = pm.find_MAP()
complete_observations is an np array of booleans. It isn’t working though because
complete_observations is a TensorConstant in
lomax_logp. How can I modify this to make it work?
I saw this example in the pymc3 source code, but the approach above seems more readable if we could get it working. Thank you again!
I assume that
complete_observations is supposed to be a vector of 0 and 1 to indicate censor or not?
If that’s the case, try something like this:
return shape * (tt.log(scale) - tt.log(scale + t)) + complete_observation*(tt.log(shape) - tt.log(scale + t))
PS, are you sure the
tt.log(shape) is correct here? if there are censor value shouldnt you only counting the noncensored ones?
Ah i see,
complete_observations needs to be included in the theano statement, thanks. And I actually want to include the censored examples in the likelihood since they’ll inform the distribution of survival, if that’s what you’re asking?
Unrelated: is it possible to coerce scipy functions to valid theano objects? For instance I’m trying to use scipy’s
betainc (regularized incomplete beta function) for the cdf of a beta distribution, but getting
TypeError: ufunc 'betainc' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Actually think I may have solved this?
from scipy.special import betainc as regularized_beta
from theano.compile.ops import as_op
@as_op(itypes=[tt.dscalar, tt.dscalar, tt.dvector], otypes=[tt.dvector])
def reg_beta(a, b, x):
return regularized_beta(a, b, x)
Using this log probability:
def gamma_gamma_logp(t, complete, shape, rate_shape, rate_rate):
x = t / (t + rate_rate)
return complete * (rate_shape * tt.log(rate_rate) + (shape - 1) * tt.log(t) - log_beta(shape, rate_shape) - (shape + rate_shape) * tt.log(rate_rate + t)) + (1 - complete) * tt.log1p(reg_beta(shape, rate_shape, x))
find_MAP() works, though it gives
Warning: gradient not available.(E.g. vars contains discrete variables). MAP estimates may not be accurate for the default parameters. Defaulting to non-gradient minimization 'Powell'.
so I’ll need to investigate that a bit
Yes this is a standard way to wrap scipy/numpy function into theano. But as you can see, wrapping it does not provide gradient, so in general we recommend rewriting the function in theano so you can take advantage of modern samplers (NUTS).
Yep, think I’m starting to get the hang of this :). Thank you again for all the help.
By the way, is there any plan to incorporate pymc with Spark once the switch to TFP happens in v4?
Not currently - but hopefully there are some infrastructure of tensorflow+spark by then that we can borrow.