The logp outputs are different between bambi and pymc implementations

cuberoot · March 17, 2024, 1:38pm

I was trying to build a GLM model which would be able to run on weighted observations by following suggestions in the following answer from the forum: Weights for Model in Bambi - #11 by tcapretto

Unfortunately, I was getting quite different results with weighted and unweighted datasets when I tried to make it work with Bambi (version 0.12.0). I tried to implement the same model in PyMC and the problem was no longer there (i.e. I was getting very similar results for both setups). I managed to create a reproducible example which shows what I believe is the reason why the results were so different with Bambi. Basically, it looks like the likelihood of the same observation is based not only on the values of that observation and model parameters, but also on the distribution of other inputs in the dataset. The example below shows that Bambi produces different logp outputs compare to the “equivalent” model implemented in PyMC. To be fair, I cannot quite confirm that models are equivalent and clearly that there must be some difference if the answers are different. Perhaps there is some normalisation of the data happening in the background.

I would appreciate if somebody would be able to point me to the difference as well as how to make logp values the same by adjusting pymc and/or bambi model.

import pandas as pd
import bambi as bmb
import pymc as pm

print("pandas", pd.__version__)
print("bambi", bmb.__version__)
print("pymc", pm.__version__)

dfs =[
    pd.DataFrame(columns=["x", "y"], data=[[2, 0], [2, 0], [1, 1]]),
    pd.DataFrame(columns=["x", "y"], data=[[2, 0], [1, 1], [1, 1]]),
    pd.DataFrame(columns=["x", "y"], data=[[2, 0], [2, 1], [1, 1]]),
]

print("")
print("Bambi model")
print("-" * 30)
init = {'Intercept': 1, 'x': 1}

for df in dfs:
    model = bmb.Model(
        "y ~ 1 + x",
        data=df,
        family="poisson",
        priors={
            "Intercept": bmb.Prior("Normal", mu=1, sigma=1),
            "x": bmb.Prior("Normal", mu=1, sigma=1),
        },
        auto_scale=False,
        noncentered=True,
    )
    model.build()
    logp = model.backend.model.compile_logp(sum=False, jacobian=False)
    print(logp(init)[-1])

print("")
print("PyMC model")
print("-" * 30)
init = {'Intercept': 1, 'w': 1}

for df in dfs:
    model = pm.Model()
    
    with model:
        intercept = pm.Normal("Intercept", mu=1, sigma=1)
        w = pm.Normal("w", mu=1, sigma=1)
    
        lam = pm.math.exp(intercept + w * df.x)
        y = pm.Poisson("y", mu=lam, observed=df.y)

        logp = model.compile_logp(sum=False, jacobian=False)
        print(logp(init)[-1])

Outputs:

pandas 1.3.5
bambi 0.12.0
pymc 5.6.1

Bambi model
------------------------------
[-3.79366789 -3.79366789 -1.06227909]
[-5.29449005 -1.28106737 -1.28106737]
[-3.79366789 -2.46033456 -1.06227909]

PyMC model
------------------------------
[-20.08553692 -20.08553692  -5.3890561 ]
[-20.08553692  -5.3890561   -5.3890561 ]
[-20.08553692 -17.08553692  -5.3890561 ]

tcapretto · March 17, 2024, 2:43pm

Hi @cuberoot!

You’re right there’s something happening under the hood. Notice initializer of the bmb.Model class has the center_predictors argument.

For efficiency reasons, Bambi mean-centers the predictors when there is an intercept. This has its drawbacks. Mainly, the prior on the intercept is not what you think it is anymore. In this case, it’s not the mean when all predictors are zero, but the mean when all the predictors are equal to their mean.

If you want to have the same logp you can pass center_predictors=False to bmb.Model().

Look at the center_predictors argument in here https://bambinos.github.io/bambi/api/Model.html`

cuberoot · March 17, 2024, 2:53pm

Thank you for the quick response @tcapretto. The 0.12.0 version, which I’m using, doesn’t have this option. Does it mean that the logic is present in the v0.12.0, but there is no way of disabling it?

bmb.Model(
    formula,
    data,
    family='gaussian',
    priors=None,
    link=None,
    categorical=None,
    potentials=None,
    dropna=False,
    auto_scale=True,
    noncentered=True,
    extra_namespace=None,
)

cuberoot · March 17, 2024, 3:30pm

I checked the code and it looks like the v0.12.0 doesn’t have a way of disabling the logic and the option appeared only in v0.13.0. The logic is in the bambi/backend/model_components.py script on line 110

v0.12.0:

if self.has_intercept:
    self.design_matrix_without_intercept = data
    data = data - data.mean(0)

v0.13.0

if self.has_intercept and bmb_model.center_predictors:
    self.design_matrix_without_intercept = data
    data = data - data.mean(0)

cuberoot · March 17, 2024, 3:32pm

Thank you for the help @tcapretto

tcapretto · March 17, 2024, 3:40pm

Yeah that’s right, I recommend upgrading to the latest release

Topic		Replies	Views
Bambi vs pymc model specification Questions linear_model	5	1317	February 26, 2022
Trouble Replicating Bambi v5 modeling	2	317	November 6, 2022
Using az.compare With Bambi Models	3	825	August 7, 2024
Bambi V PYMC Polynomial Regression - Sanity Check / Help Needed modeling	2	61	April 10, 2025
Weights for Model in Bambi Questions bambi	13	1099	March 31, 2023

The logp outputs are different between bambi and pymc implementations

Related topics