Dealing with negative values and the exponential distribution

2aAntagonist · April 11, 2022, 8:37am

Hi everyone,

a very newbie question:

I’m looking to model time between events with a relatively large range (~1-6000 seconds). The empirical data is well described by an exponential distribution, eg:

with pm.Model() as model0:

    a = pm.Gamma("a", mu=0.1, sigma=0.5)
    sec = pm.Exponential("sec", a, observed=d0.tdeltas)
    trace0 = pm.sample(1000, tune=1000)

I run into issues when I want to model additional effects. For example, an effect of time. I have ~10000 sequential observations, which I turn into a ‘time’ variable as follows (I realize this can be done more accurately later by using actual time between events):

time = np.arange( len(d0) )
time = (time - time.mean()) / time.std()

As the effect of time could be positive or negative, I wanted to use a Gaussian:

with pm.Model() as model0:

    a = pm.Gamma("a", mu=0.1, sigma=0.5)
    b = pm.Normal("b", mu=0, sd=1)
    lam = pm.Deterministic("lam", a + b*time)
    sec = pm.Exponential("sec", lam, observed=d0.tdeltas)
    
    trace0 = pm.sample(1000, tune=1000)

This results in zero divison errors and “initial evaluation errors” like so:

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'a_log__': array(-1.620602, dtype=float32), 'b': array(0.9321959, dtype=float32)}

Initial evaluation results:
a_log__   -3.38
b         -1.35
sec        -inf
Name: Log-probability of test_point, dtype: float32

I assume this has to do with the fact that some samples could result in negative (and thus invalid) values for lam? I haven’t found a solution yet going through the documentation/forums and would be very thankful for any pointers!

larryshamalama · April 11, 2022, 2:11pm

Yes that is correct. The exponential distribution requires lam to be a positive quantity and your b parameter could be initialized at a negative value.

An alternative approach to modelling that comes to the top of my mind would be to log your data and fit it in a linear model. Perhaps something like:

a = pm.Gamma("a", mu=0.1, sigma=0.5) # maybe use a Normal prior here?
b = pm.Normal("b", mu=0, sd=1)
lam = pm.Deterministic("lam", a + b*time)
sec = pm.Normal("sec", lam, observed=np.log(d0.tdeltas))

trace0 = pm.sample(1000, tune=1000)

2aAntagonist · April 11, 2022, 8:53pm

Thanks a lot for your suggestion! (And someone for formatting my post:))

I’ve experimented with it and I’m wondering whether it’s a problem that the resulting distribution (after taking the log) is not really Gaussian?

I’ve been playing around with a simplified version to understand log-transformations:

with pm.Model() as model0:
    a = pm.Normal("a", mu=6, sd=1)
    b = pm.Exponential("b", 1/2)
    sec = pm.Normal("sec", mu=a, sd=b, observed=np.log(d0.tdeltas+1e-8))
    
    trace0 = pm.sample(1000, tune=2000)

Whereas the empirical mean is close to 600, I can’t recover this accurately:

with model0:
    ppc = pm.sample_posterior_predictive(trace1)
    
az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model0));
print("mean=", np.exp(ppc["sec"]).mean())
mean= 734.969762489091

(Not sure why my posterior mean is larger than the empirical mean - the ppc plot seems to indicate the opposite.)

Anyway - I just wanted to check whether I’m misinterpreting something or whether this is an unavoidable downside of transforming my data?

cluhmann · April 11, 2022, 10:35pm

I would suggest looking at either Poisson regression or negative binomial regression for inspiration/ideas. These models are traditionally used to model counts and other non-negative outcome variables (e.g., length of inter-event intervals). In Poisson, you construct a linear expression (e.g., \alpha + \beta X) and then the (canonical) link function is an exponential (so E[Y]=e^{\alpha + \beta X} or, ln(E[Y])=\alpha + \beta X, which was similar to the suggestion by @larryshamalama ) which forces the expectation to be non-negative. However, as you note, the transformation warps things (e.g., a prior over \beta that is normally distributed in the linear expression will not “look” normal in the transformed space). But hopefully that will give you some resources to check out!

2aAntagonist · April 12, 2022, 8:00am

Thank you for the suggestions, I’m going to look into it!

Much appreciated.

Topic		Replies	Views
Negative initial value for Gamma prior v5 prior	4	48	July 11, 2024
Multiplicative Factor Model (degradations) version agnostic	10	267	February 21, 2024
Help with fitting Gamma Distribution Questions	4	5627	February 1, 2019
Infering exponential decay times Questions	2	1108	June 27, 2020
Modeling noising parameter that decreases (negatively varying variance) Questions	15	1057	March 3, 2021

Dealing with negative values and the exponential distribution

Related topics