Expressing a prior thru a Potential

hwassner · April 27, 2021, 9:12am

Hello,

I’m trying way to express prior on the ‘gain’ of an A/B test.
This is a “basic setting” , without prior :

visitors = [10,10]
success = [5,5]

with pm.Model() as model:
a = pm.Beta(‘a’,alpha=1,beta=1)
b = pm.Beta(‘b’,alpha=1,beta=1)
obs_a = pm.Binomial(‘obs_a’,n=visitors[0],p=a,observed=success[0])
obs_b = pm.Binomial(‘obs_b’,n=visitors[1],p=b,observed=success[1])
g = pm.Deterministic('g',(b-a)/a)

trace = pm.sample(tune=2000) 

a & b are the clic rates, and g is the relative gain, everything work as expected.

Now, I’m adding a prior to my model. Note : the prior is expressed on g (the gain), not on the clic rates.
I’m doing it with a potential and a normal pdf (and with the same data).

with pm.Model() as model:
a = pm.Beta(‘a’,alpha=1,beta=1)
b = pm.Beta(‘b’,alpha=1,beta=1)
obs_a = pm.Binomial(‘obs_a’,n=visitors[0],p=a,observed=success[0])
obs_b = pm.Binomial(‘obs_b’,n=visitors[1],p=b,observed=success[1])
g = pm.Deterministic('g',(b-a)/a)
prior = pm.Normal.dist(mu=0,sd=10) 
pot = pm.Potential('p',prior.logp(g))

trace = pm.sample(tune=2000) 

Everything works ok :

I’m getting more or less the same posterior, since my prior was very weak, so this result is ok.

Now I will use a stronger prior with a smaller sd value, expecting to obtain a narrower posterior for g.

with pm.Model() as model:
a = pm.Beta(‘a’,alpha=1,beta=1)
b = pm.Beta(‘b’,alpha=1,beta=1)
obs_a = pm.Binomial(‘obs_a’,n=visitors[0],p=a,observed=success[0])
obs_b = pm.Binomial(‘obs_b’,n=visitors[1],p=b,observed=success[1])
g = pm.Deterministic('g',(b-a)/a)
prior = pm.Normal.dist(mu=0,    sd=0.1) # stronger prior 
pot = pm.Potential('p',prior.logp(g))

trace = pm.sample(tune=2000) 

This leads to some diagnostic problem :

Sampling 4 chains for 2_000 tune and 1_000 draw iterations (8_000 + 4_000 draws total) took 4 seconds. There was 1 divergence after tuning. Increase target_accept or reparameterize. There was 1 divergence after tuning. Increase target_accept or reparameterize. The acceptance probability does not match the target. It is 0.7215948572359366, but should be close to 0.8. Try to increase the number of tuning steps. There were 3 divergences after tuning. Increase target_accept or reparameterize. The acceptance probability does not match the target. It is 0.7091417042624631, but should be close to 0.8. Try to increase the number of tuning steps. The number of effective samples is smaller than 25% for some parameters.

I don’t understand what the problem is, nor it’s origin.

The posterior of g is narrower as expected, but also a little less nice than the previous ones, so I guess that there is a problem, but I don’t have any clues on how to solve it.

I’ve tried with more tuning steps (as suggested by the diagnostic message), but I still have the warning : “The number of effective samples is smaller than 25% for some parameters.”, and the posterior doesn’t look nicer… So I guess the problem is deeper than that…

any help will be appreciated.

regards,
Hubert.

ckrapu · April 27, 2021, 1:25pm

What if you penalize b-a instead of (b-a)/a?

hwassner · April 27, 2021, 1:56pm

using (b-a) instead of (b-a)/a, only seems to look better :
no diagnostic message and ‘cleaner posterior’, but It’s only because in fact it make a wider prior regarding the sampled value. (so it’s like using sd=1 or 10 where there is no diagnostic error message…)

Changing with sd=0.01 (instead of 0.1) get back to the same diagnostic error messages.

chartl · April 27, 2021, 6:22pm

The issue is the funnel; as a → 0, then the (b-a) becomes a stricter and stricter constraint that a ~= b; so steps that were valid for a ~ 0.5 aren’t nearly as valid for a ~ 0.1. As you noted, dropping the denominator eliminated the issue with sampling – but your posterior again had that long tail since (b-a)/a could be large for small values of a.

Instead of adding in a potential, you could parameterize b in terms of the gain, as

g = (b-a)/a --> b = a(g+1)

with g+1 in the range [0, 1/a]. Then:

a = pm.Beta('a', alpha=1, beta=1')
g = pm.TruncatedNormal('gain', mu=0, sd=0.1, lower=-1, upper=(1-a)/a)
b = pm.Deterministic('b', a * (g + 1))

My intuition is that this should fix the geometry. For a → 1, the upper limit → 0 which leaves a bunch of mass in [-1, 0+); while for a → 0, the upper limit → infinity, but it’s penalized by the tail.

You could put a subsequent potential on b, but as it is uniform in this case, it won’t make a difference. Interested to see how this works.

Edit: The important part here is, I think, using a fixed sd on the gain, as opposed to linking it to a, which would cause the same kind of funnel funniness.

hwassner · April 28, 2021, 8:42am

Thanks @chartl , your alternate parametrisation is interesting.

My first intuition is that even it presents the problem differently, it just changes what you sample.

My version samples a & b, and adds constraint on g,
Your model samples a & g, and adds a constraint on b.

This is the code (using the same data) :

with pm.Model() as model:
a = pm.Beta(‘a’,alpha=1,beta=1)

g = pm.TruncatedNormal('gain', mu=0, sd=0.1, lower=-1, upper=(1-a)/a)
b = pm.Deterministic('b', a * (g + 1))

obs_a = pm.Binomial('obs_a',n=visitors[0],p=a,observed=success[0]) 
obs_b = pm.Binomial('obs_b',n=visitors[1],p=b,observed=success[1]) 

trace = pm.sample(tune=2000)

I’m still having this diagnostic warning : “The number of effective samples is smaller than 25% for some parameters.”

The result of your model look nicer :

Interestingly the gain posterior looks smoother (nicer), but since the warning is still there I’m not sure that the problem is not fully solved.

What do you think ?

chartl · April 28, 2021, 3:20pm

Can you try with a higher number of samples and show the traces? I’m interested to see if there’s obvious auto-correlation or if the model is getting stuck in a mode. Junpenglao has mentioned that the 25% is quite aggressive compared to stan:

hwassner · April 29, 2021, 2:59pm

I just added more draws to the same model :

with pm.Model() as model:
a = pm.Beta(‘a’,alpha=1,beta=1)

g = pm.TruncatedNormal('gain', mu=0, sd=0.1, lower=-1, upper=(1-a)/a)
b = pm.Deterministic('b', a * (g + 1))

obs_a = pm.Binomial('obs_a',n=visitors[0],p=a,observed=success[0]) 
obs_b = pm.Binomial('obs_b',n=visitors[1],p=b,observed=success[1]) 

trace = pm.sample(tune=2000,draws=10000)

This is the posteriors:

It look a lot smoother , but I still got diagnostic warnings …

chartl · April 29, 2021, 6:16pm

The next step of the diagnosis is to use a plot_trace and a pairplot (between gain and a) to see what is going on – though it should be stated that gain is not directly sampled, but instead a transformed variable. I wonder if there’s a way to plot that, too…

Topic		Replies	Views
pm.Potential() much needed explanation for newbie Questions	4	8360	December 6, 2018
Help refining priors Questions	11	501	August 28, 2020
How do I put a prior on a deterministic/intermediate variable? Questions	2	1194	February 18, 2019
Bayesian model with constraints Questions	1	464	September 16, 2020
Including prior information on quantities of interest, not just on parameters in your model Questions	3	468	August 23, 2019

Expressing a prior thru a Potential

Related topics