# Weights for Model in Bambi

I am attempting to use weights in a Bambi model. I plan on following the way BRMS or lmer handles weights. Is there a way to do this within Bambi? Any advice would be very helpful.

Bambi does not support this out of the box. But if you can share an example with brms, perhaps I can figure out how to hack Bambi to do it.

1 Like

@tcapretto Thank you for the response. Here is an example model in BRMS that weights by sample size. Please let me know if there is anything else I can do.

``````# Fit the model using brms
model <- brm(
DepVar | weights(SampleSize) ~ IndVar1 + IndVar2,
data = dat)
``````

Oh sorry, I should have clarified better. Can it be a reproducible example (with data, code, and output)? It doesnâ€™t need to be real data, a fake example is enough.

From that code, I can see there are some weights being used, but I donâ€™t know how they are used.

Thanks!

1 Like

@tcapretto Ah sorry about that. Below is an example with output. Thank you again!

``````library(tidyverse)
library(brms)

data <- tibble(score = c(rnorm(1000, 0, 1), c(rnorm(1000, 1, 1))),
weights = c(rep(1, 1000), rep(2, 1000)))

formula <- score | weights(weights) ~ 1

fit_weights <- brm(formula = formula,
family = gaussian(),
data = data,
control = list(adapt_delta = 0.95, max_treedepth  = 10),
chains = 1,
cores = 1,
iter = 700,
warmup = 200)
summary(fit_weights)

``````

1 Like

Thanks for the example!

I see now what you want (i.e. to weigh the contribution of each observation to the likelihood). I
can think of a complex solution that I think solves your problem but itâ€™s not scalable to other likelihoods.

This solution uses a custom family with a custom distribution which is created with DensityDist, thatâ€™s why itâ€™s very complex.

``````import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt

from bambi.families.univariate import UnivariateFamily

rng = np.random.default_rng(1234)
data = pd.DataFrame(
{
"score": np.concatenate([rng.normal(size=1000), rng.normal(loc=1, size=1000)]),
"weights": np.repeat([1, 2], 1000)
}
)

# Create custom "distribution" with DensityDist
def logp(value, mu, sigma, weights):
# This is the same as the logp of a normal, but we have the weights multiplying
res = weights * (-0.5 * pt.pow((value - mu) / sigma, 2) - pt.log(pt.sqrt(2.0 * np.pi)) - pt.log(sigma))
return res

def WeightedNormal(name, mu, sigma, weights, *args, **kwargs):
return pm.DensityDist(name, mu, sigma, weights, *args, **kwargs, logp=logp)

# Create custom family. We need to use the weights in a different way, so we need 'transform_backend_kwargs'.
class WeightedNormalFamily(UnivariateFamily):
SUPPORTED_LINKS = {"mu": ["identity", "log"], "sigma": ["log"]}
@staticmethod
def transform_backend_kwargs(kwargs):
observed = kwargs["observed"]
kwargs["observed"] = observed[:, 0]
kwargs["weights"] = observed[:, 1]
return kwargs

# Create the custom family instance
likelihood = bmb.Likelihood("WeightedNormal", params=["mu", "sigma"], parent="mu", dist=WeightedNormal)
links = {"mu": "identity", "sigma": "log"}

# Create the model in Bambi using the custom family
priors = {
"mu": bmb.Prior("Normal", mu=0, sigma=1),
"sigma": bmb.Prior("HalfNormal", sigma=1),
}
model = bmb.Model("c(score, weights) ~ 1", data, family=wn_family, priors=priors)
model.build()
model.graph()
``````

``````idata = model.fit()
az.summary(idata)
``````

So, this seems to work but it requires to implement a bunch of custom code, which depends on the likelihood. This should be way less painful when we implement it in Bambi.

I imagine something like

``````bmb.Model("weighted(y, weights) ~ ...")
``````

and that should be itâ€¦ But for now itâ€™s an expression of desire

1 Like

@tcapretto Thank you so much for the response. Iâ€™ve implemented it with my own model: there may some slight modifications I will have to make since my model/data is entirely different.

One such modification might be in the transform_backend_kwargs function. I have quickly noticed that predicting with â€śppsâ€ť does not work as it throws a keyerror on â€śobservedâ€ť.

Below is a screenshot of the error traceback. Let me know if there is anything else I can provide. Again, thank you for your help!

Could you share the full traceback? I donâ€™t see the the most recent call (I think itâ€™s the key error looking for â€śobservedâ€ť in â€śkwargsâ€ť). Make sure the `kwargs` dictionary returns the `observed` field as well.

I understand sometimes we cannot share our code because itâ€™s private, but if you can share your custom family, that would be better.

@tcapretto Apologies for the delayed reply. I really appreciate all your help on this.

The custom family is the same as above:

``````# Create custom family. We need to use the weights in a different way, so we need 'transform_backend_kwargs'.
class WeightedNormalFamily(UnivariateFamily):
SUPPORTED_LINKS = {"mu": ["identity", "log"], "sigma": ["log"]}
@staticmethod
def transform_backend_kwargs(kwargs):
observed = kwargs["observed"]
kwargs["observed"] = observed[:, 0]
kwargs["weights"] = observed[:, 1]
return kwargs
``````

Here is the full traceback:

``````---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[25], line 9
7 N_DRAWS = 10000
8 prediction_matrix = np.zeros((len(pred_dat), N_SIMULATIONS))
----> 9 preds = model.predict(approx_sample, data = pred_dat, kind="pps", inplace = False)

File /anaconda/envs/t/lib/python3.8/site-packages/bambi/models.py:760, in Model.predict(self, idata, kind, data, inplace, include_group_specific)
757 required_kwargs = {"model": self, "posterior": idata.posterior}
758 optional_kwargs = {"data": data}
--> 760 pps = self.family.posterior_predictive(**required_kwargs, **optional_kwargs)
761 pps = pps.to_dataset(name=response_aliased_name)
763 if "posterior_predictive" in idata:

File /anaconda/envs/t/lib/python3.8/site-packages/bambi/families/family.py:167, in Family.posterior_predictive(self, model, posterior, **kwargs)
164 # NOTE: Wouldn't it be better to always use parametrizations compatible with PyMC?
165 # The current approach allows more flexibility, but it's more painful.
166 if hasattr(model.family, "transform_backend_kwargs"):
--> 167     kwargs = model.family.transform_backend_kwargs(kwargs)
169 output_array = pm.draw(response_dist.dist(**kwargs))
170 output_coords_all = xr.merge(output_dataset_list).coords

Cell In[15], line 15, in WeightedNormalFamily.transform_backend_kwargs(kwargs)
13 @staticmethod
14 def transform_backend_kwargs(kwargs):
---> 15     observed = kwargs["observed"]
16     kwargs["observed"] = observed[:, 0]
17     kwargs["weights"] = observed[:, 1]

KeyError: 'observed'
``````
1 Like

I think I made it work. See Iâ€™m using `pm.CustomDist` instead of `pm.DensityDist`. I had a look at the PyMC codebase and it looks like DensityDist will be deprecated at some point and CustomDist is the new thing.

``````import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt

from bambi.families.univariate import UnivariateFamily

rng = np.random.default_rng(1234)
data = pd.DataFrame(
{
"score": np.concatenate([rng.normal(size=1000), rng.normal(loc=1, size=1000)]),
"weights": np.repeat([1, 2], 1000)
}
)

def logp(value, mu, sigma, weights):
# This is the same as the logp of a normal, but we have the weights multiplying
return weights * pm.Normal.logp(value, mu, sigma)

def random(mu, sigma, rng=None, size=None):
# Weights don't make sense when generating new observations.
# They are a property of actual observations
return rng.normal(loc=mu, scale=sigma, size=size)

class WeightedNormal:
def __new__(self, name, mu, sigma, weights, **kwargs):
return pm.CustomDist(name, mu, sigma, weights, logp=logp, **kwargs)

@classmethod
def dist(cls, mu, sigma, **kwargs):
return pm.CustomDist.dist(
mu, sigma, class_name="WeightedNormal", random=random, **kwargs
)

# Create custom family.
# We need to use the weights in a different way, so we need 'transform_backend_kwargs'.
class WeightedNormalFamily(UnivariateFamily):
SUPPORTED_LINKS = {"mu": ["identity", "log"], "sigma": ["log"]}
@staticmethod
def transform_backend_kwargs(kwargs):
if "observed" in kwargs:
observed = kwargs["observed"]
kwargs["observed"] = observed[:, 0]
kwargs["weights"] = observed[:, 1]
return kwargs

# Create the custom family instance
likelihood = bmb.Likelihood("WeightedNormal", params=["mu", "sigma"], parent="mu", dist=WeightedNormal)
links = {"mu": "identity", "sigma": "log"}

# Create the model in Bambi using the custom family
priors = {
"mu": bmb.Prior("Normal", mu=0, sigma=1),
"sigma": bmb.Prior("HalfNormal", sigma=1),
}
model = bmb.Model("c(score, weights) ~ 1", data, family=wn_family, priors=priors)
model.build()
model.graph()

idata = model.fit()
az.summary(idata)

idata_2 = model.predict(idata, data = data, kind="pps", inplace = False)
``````

See itâ€™s a lot of code. Hopefully, we can have this inside Bambi in the future.

Hey @tcapretto, I work with @iarzt. I am trying to use this WeightedNormal model. The short of it is when I try to use the WeightedNormal model with one independent variable, the coefficient is around 1.9. Iâ€™m generating this toy dataset (which does resemble my real dataset), so I know the true coefficient should be 1.0, and a linear model in sklearn learns a coefficient of 1.0. Additionally, a Normal model with the same dependent and independent variables produces a coefficient around 1.0 and a WeightedNormal model trained on just the Intercept matches the dependent variable mean.

Iâ€™ve attached the .pdf and .py files that show this example fully. Please let us know if there is something weâ€™re doing wrong or if you have any insights. Much appreciated!
weights_example_reproducible.py (4.3 KB)
weights_example_reproducible (3).pdf (348.2 KB)

Could it be a in issue of logp vs p? Should the logp be `log(weights) + logp` instead of `weights * logp`?

1 Like

That appears to have fixed it, @ricardoV94, thanks!

Looks like this is just a case of log(x * y) = log(x) + log(y)?

And just to be completely sure, I should use the natural log here, right? Iâ€™d be surprised if it were another base.

Yes natural log

2 Likes