Dealing with negative values and the exponential distribution

Hi everyone,

a very newbie question:

I’m looking to model time between events with a relatively large range (~1-6000 seconds). The empirical data is well described by an exponential distribution, eg:

with pm.Model() as model0:

    a = pm.Gamma("a", mu=0.1, sigma=0.5)
    sec = pm.Exponential("sec", a, observed=d0.tdeltas)
    trace0 = pm.sample(1000, tune=1000)

I run into issues when I want to model additional effects. For example, an effect of time. I have ~10000 sequential observations, which I turn into a ‘time’ variable as follows (I realize this can be done more accurately later by using actual time between events):

time = np.arange( len(d0) )
time = (time - time.mean()) / time.std()

As the effect of time could be positive or negative, I wanted to use a Gaussian:

with pm.Model() as model0:

    a = pm.Gamma("a", mu=0.1, sigma=0.5)
    b = pm.Normal("b", mu=0, sd=1)
    lam = pm.Deterministic("lam", a + b*time)
    sec = pm.Exponential("sec", lam, observed=d0.tdeltas)
    trace0 = pm.sample(1000, tune=1000)

This results in zero divison errors and “initial evaluation errors” like so:

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'a_log__': array(-1.620602, dtype=float32), 'b': array(0.9321959, dtype=float32)}

Initial evaluation results:
a_log__   -3.38
b         -1.35
sec        -inf
Name: Log-probability of test_point, dtype: float32

I assume this has to do with the fact that some samples could result in negative (and thus invalid) values for lam? I haven’t found a solution yet going through the documentation/forums and would be very thankful for any pointers!

1 Like

Yes that is correct. The exponential distribution requires lam to be a positive quantity and your b parameter could be initialized at a negative value.

An alternative approach to modelling that comes to the top of my mind would be to log your data and fit it in a linear model. Perhaps something like:

a = pm.Gamma("a", mu=0.1, sigma=0.5) # maybe use a Normal prior here?
b = pm.Normal("b", mu=0, sd=1)
lam = pm.Deterministic("lam", a + b*time)
sec = pm.Normal("sec", lam, observed=np.log(d0.tdeltas))

trace0 = pm.sample(1000, tune=1000)

Thanks a lot for your suggestion! (And someone for formatting my post:))

I’ve experimented with it and I’m wondering whether it’s a problem that the resulting distribution (after taking the log) is not really Gaussian?

I’ve been playing around with a simplified version to understand log-transformations:

with pm.Model() as model0:
    a = pm.Normal("a", mu=6, sd=1)
    b = pm.Exponential("b", 1/2)
    sec = pm.Normal("sec", mu=a, sd=b, observed=np.log(d0.tdeltas+1e-8))
    trace0 = pm.sample(1000, tune=2000)

Whereas the empirical mean is close to 600, I can’t recover this accurately:

with model0:
    ppc = pm.sample_posterior_predictive(trace1)
az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model0));
print("mean=", np.exp(ppc["sec"]).mean())
mean= 734.969762489091


(Not sure why my posterior mean is larger than the empirical mean - the ppc plot seems to indicate the opposite.)

Anyway - I just wanted to check whether I’m misinterpreting something or whether this is an unavoidable downside of transforming my data?

I would suggest looking at either Poisson regression or negative binomial regression for inspiration/ideas. These models are traditionally used to model counts and other non-negative outcome variables (e.g., length of inter-event intervals). In Poisson, you construct a linear expression (e.g., \alpha + \beta X) and then the (canonical) link function is an exponential (so E[Y]=e^{\alpha + \beta X} or, ln(E[Y])=\alpha + \beta X, which was similar to the suggestion by @larryshamalama ) which forces the expectation to be non-negative. However, as you note, the transformation warps things (e.g., a prior over \beta that is normally distributed in the linear expression will not “look” normal in the transformed space). But hopefully that will give you some resources to check out!


Thank you for the suggestions, I’m going to look into it!

Much appreciated.

1 Like