Pymc3 gives shifted result compared to analytical bayes solution

Hello
I’m (very) new to pymc3, it’s a great tool that I’d like to learn more about.

I’m trying to understand the syntax and methods by testing pymc3 inference on simple distributions where I can compute the bayesian inference result analytically (for example a single Beta function) The pymc3 result is a bit different than the bayes optimal. Does anyone know why this might happen?


   # 
    with pm.Model() as model:

        # Define the Beta 
        beta1_ = pm.Beta('beta1', alpha=5.1, beta=2)

        # Step 5: Define the likelihood
        likelihood = pm.Bernoulli('likelihood', 
                                  p=beta1_, 
                                  observed=observed_data)
        
        # Step 6:
        posterior = pm.sample(2000, tune=1000, cores=16)

        # Step 7: Run inference algorithm
        trace = pm.sample(2000, 
                          tune=1000, 
                          cores=16, 
                          chains=1)

observed_data:  [1 1 1 0 1 1 1 1 1 1]
observed_data bias:  0.9
ground truth bias: 0.76767677

Gives this weird looking offset result.

I’d like to use pymc3 for more complex distributions - but if I can’t even get the simple case right, i’m worried it won’t be right.

bayes_optimal_vs_pymc3

[EDIT: Update] Ok, there was a mistake in my alpha/beta params for the Beta function. The results are nearly identical now!).

bayes_optimal_vs_pymc3

I don’t quite understand what your plot means. What’s on the x axis? What is the bayes optimal?

A beta-binomial model like that should show a posterior beta that is Beta(a+successes, b+failures), where a and b are your prior parameters. So you should see beta1_~Beta(5.1+successes, 2+failures)

Unrelated to your question, pymc3 is quite old and no longer maintained. I suggest you update to pymc (v>5) if you intend to use the library seriously.

Yes, sorry, indeed I’ll switch over to pymc shortly.

Re: example, the x axis is the observation #. We want to update our inference after every observation. For example, at x=0, we have not observed any values and we just get the expected value of the latent. (this is for a coin-toss like paradigm, 0: tails; 1:heads).

As we observe more coin tosses (from the biased coin) we can update out prediction.

The red is indeed the Beta distribution with the update you mentioned. The blue is using pymc3 code above.

And I’d like to know why they are not nearly identical?

Your bayes optimal seems to be wrongly calculated?

After your first observation, you have a posterior Beta(5.1+1, 2), with expectation 6.1/8.1=0.75 which seems to be roughly where your first second blue dot is (first dot is just prior?). But you were expecting something above 0.8 according to your red dot? How did you compute that?

Ok, I’m sorry, let me double check.

[EDIT]: Ok, indeed, something bad was going on in my Beta computation (really dumb mistake, the original distribution was Beta(5.2,1) not Beta (5.1,2). I think it’s working now. I’m updating plots above.

Thanks so much for the detailed answers (and I’ll switch to pymc shortly).

1 Like