Modeling noising parameter that decreases (negatively varying variance)

Hello!

This question builds on a topic/toy-model that I posted here, where @ricardoV94 helped me a lot.

It’s a really simple asymptote function:

y \sim N(\mu = \frac{x}{x+k}, \epsilon )

My real data is an ecological distance decay curve. The “noise” around the asymptote function
is probably actually best modeled as a linear function where y-intercept is our max initial variance, and
our neg slope is the rate at which the variance decreases with distance. By the farthest distances there is very little variance left, the data converges onto the theoretical/deterministic model:

\epsilon \sim \gamma + \delta x
\{ -1 <\delta <0\}

Where γ describes the initial widest, noisiest area of the curve, and δ, which is negative, describes how fast the variance tightens up. So for example, here is some toy data, with starting SD of 0.2, and a rate of tightening of 0.2:

xx = np.array(range(0,3500,100))
yy_real = xx/(100+xx)
## normalize xx
xnorm = pd.Series(xx/xx.max())
ee = xnorm.apply(lambda x: np.random.normal(0,.2-0.2*x)).values
yy = yy_real+ee

Which looks like this:
asymptote_vv

I’m trying to model like this:

with pm.Model() as model_vv:  
    γ = pm.Gamma('γ', mu=0.2, sigma = 0.01) ## y-int, = initial maximum spread of variance
    δ = pm.Normal('δ', mu=(-0.2), sd=0.05) ## slope, rate of tightening of variance
    κ = pm.Normal('κ', mu=100, sigma = 10) ## same old k
    μ = pm.Deterministic('μ', xx/(xx+κ))
    ε = pm.Deterministic('ε', γ + δ*xnorm)
    y_pred = pm.Normal('y_pred', mu=μ, sd=ε, observed=yy)
    trace_g = pm.sample(2000, tune=1000)

Syntax seems fine, no errors returned. But the results are not useable. A ton of divergences happen, I can’t plot any of the variables from the posterior. The following warnings are given:

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [κ, δ, γ]
Sampling 2 chains, 2,318 divergences: 100%|| 6000/6000 [00:25<00:00, 235.95draws/s]
There were 1322 divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is 0.6121013638468412, but should be close to 0.8. Try to increase the number of tuning steps.
There were 995 divergences after tuning. Increase target_accept or reparameterize.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The estimated number of effective samples is smaller than 200 for some parameters.

Does this mean that the above data are simply to noisy to fit to this curve? My real data resembles this toy data pretty well (just + 1800 pts), and I’m increasingly convinced the above conceptual model is sound.

Thanks in advance, sorry for stupid questions, I’m just getting used to bayesian methods.

Dan

I played around (a tiny bit) with the code you provided. I can’t even get it run as it is. This seems to be due to the fact that the SD of y_pred is very likely to be \le 0 for some values of xnorm. If you are going to use a linear expression to characterize the scale of a parameter, you might consider respecifying things so that you exponentiate the linear portion (e.g., \sigma=e^{\gamma + (\delta*xnorm)}). This would require you to respecify some of your priors (e.g., γ and δ), but it would ensure that the SD was strictly positive. You could also bound ε using pm.Bound(), but this would present its own sample problems (MCMC doesn’t like arbitrary boundaries).

Thanks for checking the model. Turns out that for me as written above the model would run for me about half the times (giving me the funky results mentioned above), and the other half of the time gave a “bad initial energy” error, I assume for the reasons you give. I am having troubles with models even after exponentiation, but I will continue to tinker with that and report later.

But I wanted to ask you before I go much longer on this, are there are other ways that come to mind to characterize the tightening variance phenomenon? Your answer makes me think that maybe an (exponentiated) linear expression isn’t the only approach here? This negative variance has turned out to be very tricky for me to implement (probably just because I’m new with pymc3). I’ve been sticking to it because I think it well reflects some processes structuring my actual data in a way similar to the graph I showed above.

Hi @danchurch could you provide the model data and priors in one of those failing runs that lead to bad initial energy?

Since you are fitting your model in data that you know was generated like this it should be possible to have it sample okayish.

Hi @ricardoV94 - are we talking about the code I posted above, in the beginning of this thread?
If so, that code is the model, the priors, and the toy observed-data. At least with my setup, copying and pasting that code recreates the errors. If you re-run the whole code block multiple times (including toy data generation), it should behave as described, sometimes “working” and sometimes throwing a bad initial energy error.

The only thing that changes between “successful” (= no error returned, even though the chains don’t converge) and unsuccessful (error message = “bad initial evergy, ypred -inf”) is the observed data yy. I think this is because yy has some random error on it that changes a little every time, if you rerun the whole block of code including the toy data generation. The bad initial energy errors are usually thrown early in the sampling or not at all.

But if you’re talking about the exponentiated model, I’m still tinkering with that…

I was talking about the exponentiated model yes. It’s not surprising that your original fails for the reasons @cluhmann mentioned above, but I was curious about the exponentiated one.

Okay, thanks. Will post tonight, if I can’t make it work.

Okay @ricardoV94 and @cluhmann if you’re still there, this is my attempt to model this with an exponentiated linear model. I am still given bad initial energy errors, and this time the sampling never completes, error is thrown very early in the process. I know I’m probably still somehow asking the sampler to examine mathematical impossibilities…

xx = np.arange(1,3501,20)
yy_real = xx/(100+xx)
γ_mean = np.log(0.2)
δ_mean = -0.0004
a = lambda x: np.power(np.e, δ_mean*x + γ_mean)
ee = np.random.normal(0, a(xx))
yy = yy_real + ee

with pm.Model() as model_vv:
    γ = pm.Gamma('γ', mu=-1.5, sigma = 0.5) ## analagous to initial spread of variance (take natural log)
    δ = pm.Normal('δ', mu=-0.0004, sd=0.0002) ## slope, rate of tightening of variance
    κ = pm.Normal('κ', mu=100, sigma = 10) ## same old k
    μ = pm.Deterministic('μ', xx/(xx+κ))
    ε = pm.Deterministic('ε', np.power(np.e, γ + δ*xx))
    y_pred = pm.Normal('y_pred', mu=μ, sd=ε, observed=yy)
    trace_g = pm.sample(2000, tune=1000)

The error thrown is:

Bad initial energy, check any log probabilities that are inf or -inf, nan or very small:
γ_log__ NaN
y_pred -inf

The toy data looks like this:

vvToy

The mean of the gamma must be positive:

    γ = pm.Gamma('γ', mu=-1.5, sigma = 0.5) 
1 Like

I also expect that this prior is too tight (SD too small) to allow for decent sampling. But that’s a guess.

1 Like

@ricardoV94 Thanks for catching that. Since I am exponentiating the noising process now, I think it now would be safe to switch over to a normally-distributed γ? With the exponentiated version I now need a negative \kappa. So this seems more intuitive to me.

@cluhmann I have tried increasing the variance of \delta, with a range of values up to sigma=1. Nothing seems to work, the bad initial energy error is thrown. Because of the exponentiation, I have to keep \delta pretty small.

Latest version, still thowing errors:

xx = np.arange(1,3501,20)
yy_real = xx/(100+xx)
γ_mean = np.log(0.2)
δ_mean = -0.0004
a = lambda x: np.power(np.e, δ_mean*x + γ_mean)
ee = np.random.normal(0, a(xx))
yy = yy_real + ee

with pm.Model() as model_vv:
    γ = pm.Normal('γ', mu=-1.5, sigma = 0.5) ## analagous to initial spread of variance (take natural log)
    #γ = pm.Gamma('γ', mu=1.5, sigma = 0.5) ## gamma version
    δ = pm.Normal('δ', mu=-0.0004, sigma=0.05) ## slope, rate of tightening of variance
    κ = pm.Normal('κ', mu=100, sigma = 10) ## same old k
    μ = pm.Deterministic('μ', xx/(xx+κ))
    ε = pm.Deterministic('ε', np.power(np.e, γ + δ*xx))
    #ε = pm.Deterministic('ε', np.power(np.e, -γ + δ*xx)) ## use with gamma, neg coefficient
    y_pred = pm.Normal('y_pred', mu=μ, sd=ε, observed=yy)
    trace_g = pm.sample(2000, tune=1000)

Your model works for me (though I didn’t test it by any means) if you call pm.sample(init='adapt_diag') instead of pm.sample(). The default is pm.sample(init='jitter+adapt_diag'), but the jittering can sometimes cause problems during initialization (but not sampling proper), e.g., when the scale of parameters is very small.

Yes, works great for me now. Image below is shown using the first and second sd of the posteriors,

vvToy

Will probably have another post coming soon as I now apply this to my real data…

I’d like to mark a solution here but this was a pretty organic process, not sure where to call it? Regardless, thanks for all the help, @cluhmann and @ricardoV94.

3 Likes

Those posteriors look really neat!

Do you mind if I borrow your toy example for an article I am writing on using log transformations in bayesian models?

Don’t mind at all. @cluhmann contributed the transformation ideas.

I should also say that the initial inspiration for the varying-variance came from @aloctavodia’s examples in Bayesian Analysis with Python, so my stuff looks a lot like it is ripped off from him (it was!).

3 Likes

Looks good! I wouldn’t worry too much about the details of marking a solution. As long as something is marked as a solution, the thread will appear (e.g., to future browsers) to have a solution somewhere inside.