I was trying to build a GLM model which would be able to run on weighted observations by following suggestions in the following answer from the forum: Weights for Model in Bambi - #11 by tcapretto
Unfortunately, I was getting quite different results with weighted and unweighted datasets when I tried to make it work with Bambi (version 0.12.0). I tried to implement the same model in PyMC and the problem was no longer there (i.e. I was getting very similar results for both setups). I managed to create a reproducible example which shows what I believe is the reason why the results were so different with Bambi. Basically, it looks like the likelihood of the same observation is based not only on the values of that observation and model parameters, but also on the distribution of other inputs in the dataset. The example below shows that Bambi produces different logp outputs compare to the “equivalent” model implemented in PyMC. To be fair, I cannot quite confirm that models are equivalent and clearly that there must be some difference if the answers are different. Perhaps there is some normalisation of the data happening in the background.
I would appreciate if somebody would be able to point me to the difference as well as how to make logp values the same by adjusting pymc and/or bambi model.
import pandas as pd
import bambi as bmb
import pymc as pm
print("pandas", pd.__version__)
print("bambi", bmb.__version__)
print("pymc", pm.__version__)
dfs =[
pd.DataFrame(columns=["x", "y"], data=[[2, 0], [2, 0], [1, 1]]),
pd.DataFrame(columns=["x", "y"], data=[[2, 0], [1, 1], [1, 1]]),
pd.DataFrame(columns=["x", "y"], data=[[2, 0], [2, 1], [1, 1]]),
]
print("")
print("Bambi model")
print("-" * 30)
init = {'Intercept': 1, 'x': 1}
for df in dfs:
model = bmb.Model(
"y ~ 1 + x",
data=df,
family="poisson",
priors={
"Intercept": bmb.Prior("Normal", mu=1, sigma=1),
"x": bmb.Prior("Normal", mu=1, sigma=1),
},
auto_scale=False,
noncentered=True,
)
model.build()
logp = model.backend.model.compile_logp(sum=False, jacobian=False)
print(logp(init)[-1])
print("")
print("PyMC model")
print("-" * 30)
init = {'Intercept': 1, 'w': 1}
for df in dfs:
model = pm.Model()
with model:
intercept = pm.Normal("Intercept", mu=1, sigma=1)
w = pm.Normal("w", mu=1, sigma=1)
lam = pm.math.exp(intercept + w * df.x)
y = pm.Poisson("y", mu=lam, observed=df.y)
logp = model.compile_logp(sum=False, jacobian=False)
print(logp(init)[-1])
Outputs:
pandas 1.3.5
bambi 0.12.0
pymc 5.6.1
Bambi model
------------------------------
[-3.79366789 -3.79366789 -1.06227909]
[-5.29449005 -1.28106737 -1.28106737]
[-3.79366789 -2.46033456 -1.06227909]
PyMC model
------------------------------
[-20.08553692 -20.08553692 -5.3890561 ]
[-20.08553692 -5.3890561 -5.3890561 ]
[-20.08553692 -17.08553692 -5.3890561 ]