The logp outputs are different between bambi and pymc implementations

I was trying to build a GLM model which would be able to run on weighted observations by following suggestions in the following answer from the forum: Weights for Model in Bambi - #11 by tcapretto

Unfortunately, I was getting quite different results with weighted and unweighted datasets when I tried to make it work with Bambi (version 0.12.0). I tried to implement the same model in PyMC and the problem was no longer there (i.e. I was getting very similar results for both setups). I managed to create a reproducible example which shows what I believe is the reason why the results were so different with Bambi. Basically, it looks like the likelihood of the same observation is based not only on the values of that observation and model parameters, but also on the distribution of other inputs in the dataset. The example below shows that Bambi produces different logp outputs compare to the “equivalent” model implemented in PyMC. To be fair, I cannot quite confirm that models are equivalent and clearly that there must be some difference if the answers are different. Perhaps there is some normalisation of the data happening in the background.

I would appreciate if somebody would be able to point me to the difference as well as how to make logp values the same by adjusting pymc and/or bambi model.

import pandas as pd
import bambi as bmb
import pymc as pm

print("pandas", pd.__version__)
print("bambi", bmb.__version__)
print("pymc", pm.__version__)

dfs =[
    pd.DataFrame(columns=["x", "y"], data=[[2, 0], [2, 0], [1, 1]]),
    pd.DataFrame(columns=["x", "y"], data=[[2, 0], [1, 1], [1, 1]]),
    pd.DataFrame(columns=["x", "y"], data=[[2, 0], [2, 1], [1, 1]]),
]

print("")
print("Bambi model")
print("-" * 30)
init = {'Intercept': 1, 'x': 1}

for df in dfs:
    model = bmb.Model(
        "y ~ 1 + x",
        data=df,
        family="poisson",
        priors={
            "Intercept": bmb.Prior("Normal", mu=1, sigma=1),
            "x": bmb.Prior("Normal", mu=1, sigma=1),
        },
        auto_scale=False,
        noncentered=True,
    )
    model.build()
    logp = model.backend.model.compile_logp(sum=False, jacobian=False)
    print(logp(init)[-1])

print("")
print("PyMC model")
print("-" * 30)
init = {'Intercept': 1, 'w': 1}

for df in dfs:
    model = pm.Model()
    
    with model:
        intercept = pm.Normal("Intercept", mu=1, sigma=1)
        w = pm.Normal("w", mu=1, sigma=1)
    
        lam = pm.math.exp(intercept + w * df.x)
        y = pm.Poisson("y", mu=lam, observed=df.y)

        logp = model.compile_logp(sum=False, jacobian=False)
        print(logp(init)[-1])

Outputs:

pandas 1.3.5
bambi 0.12.0
pymc 5.6.1

Bambi model
------------------------------
[-3.79366789 -3.79366789 -1.06227909]
[-5.29449005 -1.28106737 -1.28106737]
[-3.79366789 -2.46033456 -1.06227909]

PyMC model
------------------------------
[-20.08553692 -20.08553692  -5.3890561 ]
[-20.08553692  -5.3890561   -5.3890561 ]
[-20.08553692 -17.08553692  -5.3890561 ]
1 Like

Hi @cuberoot!

You’re right there’s something happening under the hood. Notice initializer of the bmb.Model class has the center_predictors argument.

For efficiency reasons, Bambi mean-centers the predictors when there is an intercept. This has its drawbacks. Mainly, the prior on the intercept is not what you think it is anymore. In this case, it’s not the mean when all predictors are zero, but the mean when all the predictors are equal to their mean.

If you want to have the same logp you can pass center_predictors=False to bmb.Model().

Look at the center_predictors argument in here https://bambinos.github.io/bambi/api/Model.html`

2 Likes

Thank you for the quick response @tcapretto. The 0.12.0 version, which I’m using, doesn’t have this option. Does it mean that the logic is present in the v0.12.0, but there is no way of disabling it?

bmb.Model(
    formula,
    data,
    family='gaussian',
    priors=None,
    link=None,
    categorical=None,
    potentials=None,
    dropna=False,
    auto_scale=True,
    noncentered=True,
    extra_namespace=None,
)

I checked the code and it looks like the v0.12.0 doesn’t have a way of disabling the logic and the option appeared only in v0.13.0. The logic is in the bambi/backend/model_components.py script on line 110

v0.12.0:

if self.has_intercept:
    self.design_matrix_without_intercept = data
    data = data - data.mean(0)

v0.13.0

if self.has_intercept and bmb_model.center_predictors:
    self.design_matrix_without_intercept = data
    data = data - data.mean(0)

Thank you for the help @tcapretto

1 Like

Yeah that’s right, I recommend upgrading to the latest release

1 Like