# Gamma noise prior, upside-down?

Hello again PyMC3 community,

As per here and here, I’ve been working on building a model, an asymptote function with a noise prior:

y \sim N(\mu = \frac{x}{x+k}, \epsilon )
\{ 0 \le y \le 1 \}, \{ 0 \le x \}

With a lot of help from @ricardoV94 and @cluhmann I have defined the noise prior \epsilon as a linearly decreasing function around \mu:

\epsilon = e^{(\delta x + \gamma)}

At any given point along the x-axis, the data is very dense around the above proposed asymptote mean function, but usually also has a skew towards y=0:

Because of this, I’d like to replace the normal distribution prior for \epsilon used in model above with a model of the variance as a gamma or other skewed distribution, but with tail of the distribution towards smaller y-values. The following seems to works fine, no errors:

x = physDist.copy()
with pm.Model() as model_gamma:
γ = pm.Normal('γ', mu=1.5, sigma = 0.5) ## analagous to initial spread of variance (natural log of)
δ = pm.Normal('δ', mu=-0.0006, sigma=0.005) ## slope, rate of tightening of variance
κ = pm.Normal('κ', mu=200, sigma = 10) ## same old k
μ = pm.Deterministic('μ', xx/(xx+κ))
ε = pm.Deterministic('ε', np.power(np.e, -γ + δ*xx))
y_pred = pm.Gamma('y_pred', mu=μ, sd=ε, observed=bcDist)


But I think I need to invert the distribution of y_pred, so that the tail of the variance points downward. How can I do this in the pymc3 syntax?

Gamma seems tough in this sort of application because of its bounded support ([0,\infty]). A more minor modification might be to replace the normal with a skew normal, and then have the skew parametrically vary as a function of xx similar to the way that your scale is a function of xx. Would that get you what you are looking for?

2 Likes

Thanks for getting back to me on this @cluhmann. I tried a skewed-normal distribution, in the following way:

xx = physDist.copy()
with pm.Model() as model_skewNormal:
γ = pm.Normal('γ', mu=1.5, sigma = 0.5) ## analagous to initial spread of variance (natural log of)
δ = pm.Normal('δ', mu=-0.0006, sigma=0.005) ## slope, rate of tightening of variance
κ = pm.Normal('κ', mu=200, sigma = 10) ## same old k
α = pm.Normal('α', mu=-1.0, sigma = 0.5)
μ = pm.Deterministic('μ', xx/(xx+κ))
ε = pm.Deterministic('ε', np.power(np.e, -γ + δ*xx))
y_pred = pm.SkewNormal('y_pred', mu=μ, sd=ε, alpha=α, observed=bcDist)


This works, but the results are really far shifted from my priors, and the mean function \mu of the posterior sits pretty far out from the rest of the distribution. I guess this is possible because of the skew? Using the skewed normal distribution pushed the asymptote into an almost right-angle pattern. I think because the low X values \{x \lt 500\} in the steep area of the asymptote are difficult to fit with this skewed distribution. Looks like this :

Orange areas representing HPDs of 50 and 94 credible intervals. The general fit of the skewed \epsilon model is pretty poor, R2= 0.13 .

This can be compared to when I assign a non-skewed normal distribution to \epsilon, which doesn’t respect the theoretical limit of my data ( y\leq 1 ) but that predicts a lot more of the variance around the mean ( R2=0.53):

To keep it simple I think that a non-skewed noising will do just fine for my purposes, maybe best to leave good-enough alone here. So I think you answered the “how-to-do-it” of my question, for anyone else who might end up here, and I figured out that my idea wasn’t great for my data 1 Like

In general, my own preference is to model the data according to (what I believe are) the characteristics of actual generative processes while also considering what your goals are. The toy data you were using previously had no upper bound at y=1, so the symmetric noise seemed fine. The skew normal noise seems to fit this new data quite well (to my eye). In contrast, the model with the symmetric normal noise looks quite poor. For example, credible values of y for x<500 look highly inconsistent with your data. In other words, your data looks very consistent with the “right angle” pattern you seem to be unhappy about. I’m not sure where your R^2 is coming from, but the eye-ball test would seem to prefer the skew normal model.

This works, but the results are really far shifted from my priors

This isn’t a problem on it’s own. Data should be allowed to shift your posterior from from your priors. If this isn’t possible, you can do without data.

and the mean function μ of the posterior sits pretty far out from the rest of the distribution.

The mean of the skew normal is not \mu, but rather \mu + \sigma \sqrt{\frac{2}{\pi}} \frac {\alpha }{{\sqrt {1+\alpha ^{2}}}}. Similarly, \sigma is not the variance. Details here.

Thanks for the critical eye here.

I’m not sure where your R2 is coming from, but the eye-ball test would seem to prefer the skew normal model.

I was used the az.r2_score for each model fit:

>>>az.r2_score(bcDist, ppc[‘y_pred’])
r2 0.516865
r2_std 0.057976

When we run the same on the skewed model, it reports a score of R2 ~ 0.13.

In general, my own preference is to model the data according to (what I believe are) the characteristics of actual generative processes while also considering what your goals are.

I agree with this, of course. So some info, to explain my doubts: this data is from an ecological distance decay study of tree communities, from a mountainous tropical region. Each data point is a comparison between the plants at one site with another site: X is the distance between them in meters, Y is the dissimilarity coefficient, where 1 is totally different plants and 0 all-same plants.

While building this model, I’m trying to account for two major influences on the pattern of decay of similarity: First is the expected behavior of dissimilarity from a geographical perspective. Generally, comparisons that are close (= low x-values) are expected to be very similar, and then comparisons will approach dissimilarity (“y”) = 1 with distance. But a second process “noises up” this simple behavior: within each of the steep micro-basins of the study region, there is extreme variation of habitat, causing a full range of differences between site comparisons in short distances.

Data should be allowed to shift your posterior from from your priors. If this isn’t possible, you can do without data.

I understand this, modifying beliefs with data is fundamental to the bayesian approach and why I’m drawn to bayesian methods. Not arguing this. But the worry here is that I am forcing an unrealistic model for the variance on the data, and that it is weakening the utility of the model in other respects. Part of the reason I am suspicious of the large shiftsfrom the priors I set is that the resulting posterior for kappa doesn’t fit the existing literature at all, or common sense - there hasn’t been an ecosystem observed yet with a halving of similarity ("\kappa" in my model) of ~12 m, which is the mean of the posterior distribution for \kappa in the above skewed normal model (except maybe a museum greenhouse!?).

I guess, in my largely-bayesian-ignorant way, I am asking: isn’t it possible that by enforcing a skewed model on the early portion of the data where the curve of the mean is extremely steep (say, x<500 in this case), I am driving the curve of the mean even higher to accommodate a long tail of variance beneath it? Hence the “right angle” shape?

The toy data you were using previously had no upper bound at y=1 , so the symmetric noise seemed fine.

Yes, I broke up the challenges I was having into a couple different example problems, to be handled one-by-one. I saved this y<1 problem for this stage. I wasn’t not overly concerned by this theoretical boundary, because I didn’t want to bow too much to the constraints of the metric (Bray-Curtis dissimilarity), at the risk of ignoring the ecological processes that we are trying to model.

The mean of the skew normal is not μ…

Okay, understood. The variances plotted above are probably actually correct, they are hpds generated from the posterior predictive distributions with az.plot_hpd() To access the true mean of the skew-normal for plotting, etc, would I take mean of the values of “y_pred” in the trace of the posterior, in the above model? It sounds like this might give a more sensible (human-readable) way of looking at the results of the skew normal model.

Ok, that definitely helps. So if this is similarity (y) by distance (xx), where distance runs from 0 to something large and similarity is bound (by construction?) to the interval [0,1]. I guess my intuition would be to flip this entire thing upside-down and have y be a decreasing function of xx:

μ = pm.Deterministic('μ', 1 / (1 + (xx/κ)) )


This is just a re-specfication of your model and one could prefer one or the other based on intuition. But it might be useful because it highlight at least a couple of things that weren’t necessarily obvious (to me) in the original specification. I would call this a decay function. So first, κ can be thought of as a decay rate. Second, thinking about it this way seems to suggest parametric and functional alternatives to what you have here. For example, this seems to suggest functional forms such as exponential (y=e^{-xx/κ}). If you wanted to keep the current form, there are modification that seem worth considering. For example, it’s not clear to me whether it is uncontroversial to assume that the expected similarity at distance zero must be zero. Your data doesn’t obviously seem to suggest such a requirement (again, eye-ball test), but I also don’t know how your similarity measure is constructed. If you wanted to estimate this as a parameter, you could do this kind of thing:

y_0 = pm.Beta('y_0', alpha=1, beta=5)
μ = pm.Deterministic('μ', y_0 / (1 + (xx/κ)) )


You could do something similar to estimate the asymptotic similarity expected as xx \rightarrow \infty (though your data seems to suggest this is unnecessary).

The other thing that you may want to try is wrapping your mean in a truncated normal. You had asked about asymmetric distributions, but if the skew is due to “artificial” bounds on the outcome variable, truncation may be a more natural solution.

Great, thanks for the input here. Don’t know if you’ve dealt with this kind of ecological data, but you nailed it on how most folks deal with this kind of data, as far a doing a log-linear transformation and treating this as a decay problem. And many folks do invert the dissimilarity distances (like Bray-Curtis, which I’m using) into similarity measures, mostly to make them more intuitive.

I’ve been sticking to the asymptote function here, in part because I am looking into the patterns of that early portion of the curve. With this and some other data, we’re really interested in the first “jump” from one micro-watershed to the neighboring one.