How to add weights to data in bayesian linear regression

I have implemented a bayesian model like this:

 #Bayesian Linear Regression
    with pm.Model() as predictive_model:
        x = pm.Data("x",xtrain.values)
        y = pm.Data("y",ytrain.values)
        a = pm.Normal('slope', mu=0, sd = 16)
        b = pm.Normal('intercept',mu= 0, sd = 16)
        s = pm.HalfNormal('error',sd =1)
        y_pred = a*x+b

        obs = pm.Normal('observation', mu = y_pred, sd=s, observed=y)

        start = pm.find_MAP()
        step = pm.NUTS(scaling=start)
        trace = pm.sample(1000, step, start=start, progressbar=True)

Now, I need to add weights associated with each data point, weights are based on recency of data(recent data is given more weights and this weight is already available to me as an array). how can I assign them?

Welcome!

There are various ways to incorporate weights into your model. Many will write custom logp functions (e.g., here and here). But I personally prefer to adjust the scale parameter of my observed variable, which allows for more heavily weighted observations to contribute more model’s logp. It’s simple and relatively transparent. Here’s a simple example:

s = pm.HalfNormal('error', sd = 1)
scaled_s = pm.Deterministic('scaled_s', s / y_weights)
obs = pm.Normal('observation',
                mu = y_pred,
                sd=scaled_s,
                observed=y)

In addition, I would strongly suggest using the defaults when calling pm.sample() until you are confident that they don’t work for you. I would particularly suggest avoiding the use of pm.find_MAP()

# nice and simple
trace = pm.sample(1000)
5 Likes

@cluhmann Thanks. I added the weights but I am getting an error while running the sample_posterior_predicive on test set

with predictive_model:
     32         pm.set_data(dict(x=xtest.values,y=ytest.values))
---> 33         posterior = pm.sample_posterior_predictive(trace)
.
.
ValueError: Cannot broadcast provided shapes (7,), (73,), () given size: ()

This issue doesn’t occur when I don’t use weights.
my train data set has 73 data points with 1 feature and test set has 7 data points.

Yes, adding the weights integrates the data more deeply into the model and makes it harder to just swap out data sets. In these cases, I tend to use the samples in the trace to generate whatever I need “by hand”. Something along these lines:

import matplotlib.pyplot as plt
n_posterior_samples = 100
for _ in range(n_posterior_samples):
    sample = np.random.choice(trace)
    mean_hat = sample['intercept'] + (sample['slope'] * new_x)
    obs_hat = np.random.normal(loc=mean_hat,
                               scale=sample['error'] / new_weights)
    plt.plot(new_x, obs_hat)
plt.show()
2 Likes

Hi, I was actually going through a similar problem, and in my case the accepted solution will be very inefficient since I need to also update the model in an online manner. Instead, I found a better way to do the same by specifying the weights in a data container, and during inference with new data, simply set the data container value with test dataset :

s = pm.HalfNormal('error', sd = 1)
weights = pm.Data("weights", y_weights) 
scaled_s = s/weights
obs = pm.Normal('observation',
                mu = y_pred,
                sd=scaled_s,
                observed=y)

After training through this, during inference, when you set your test data, simple update the data container by

pm.set_data("weights", y_weights_test)

You should be able to run it now.
The problem with pm.Deterministic() call is that it makes each weight a parameter and therefore its size become fixed.

Let me know if you have any further questions, or any feedback for this trick.

5 Likes

Question with how much impact scaling a parameter has on the model.

If s is the learnable parameter and we set sd=scaled_s, then wouldn’t s be skewed towards a smaller value anyway? If yes, then wouldn’t that reduce the impact scaling has?

It’s not clear what you mean by this. Skewed in what way?

Sorry if I was vague about this. Let me give an example.

Let’s say you have 100 response data \boldsymbol{y} that are from two different source. One from simulation y_{sim}, and another using real life experiments y_{exp}. The size of the response data from each source is \text{dim}(y_{sim}) = 95, and \text{dim}(y_{exp}) = 5. And you also want to lower weight (reduce the importance) of the simulation data, and increase the weight for the real experiments.

The standard deviation on the simulation data is very small, s_{sim} = 0.01, compared to the real experiments, s_{exp} = 0.30.

If we have two models. The first model doesn’t weight the data, and the second model does. And let’s say the weights for simulation data is 0.05, and real experiment data is 1. Below are code chunks below for the two models.

# model 1
s1 = pm.HalfNormal('error', sd = 1)
obs = pm.Normal('observation',
                mu = y_pred,
                sd=s1,
                observed=y)
# model 2
s2 = pm.HalfNormal('error', sd = 1)
weights = pm.Data("weights", y_weights) # w_sim = 0.05, w_exp = 1
scaled_s = s2/weights
obs = pm.Normal('observation',
                mu = y_pred,
                sd=scaled_s,
                observed=y)

Wouldn’t the s2 posterior be much much smaller than s1 because majority of the data in \boldsymbol{y} has a small standard deviation? If s2 posterior will be much smaller than s1, then scaled_s will be similar to s1. Therefore, scaling the data isn’t that effective.

Any thoughts on this?

Yes, that all seems correct to me. The value of scaled_s is likely to be similar to s1 on average. But the weighting scheme isn’t intended to do anything on average, it’s intended to weigh some observations more than others. So it’s effect is relative, not absolute. Maybe it helps to reformulate your model 1 like this:

# model 1
s1 = pm.HalfNormal('error', sd = 1)
weights = pm.Data("weights", np.ones(size=N))
scaled_s = s1/weights
obs = pm.Normal('observation',
                mu = y_pred,
                sd=scaled_s,
                observed=y)

This is identical to your model 2, but with equal weights applied to all observations which is the weighting implied by your original model 1. The estimates of s1 should be identical to your original model 1.

Now we can do the following:

# model 1
s1 = pm.HalfNormal('error', sd = 1)
weights = pm.Data("weights", 100*np.ones(size=N))
scaled_s = s1/weights
obs = pm.Normal('observation',
                mu = y_pred,
                sd=scaled_s,
                observed=y)

Here, we have the same, uniform weighting scheme, but we associate every observation with a weight of 100, instead of the previous weight of 1. The posterior of s1 will likely suggest larger values in order to compensate for the extremely large weights we have applied. But things should mostly balance out.

But this is very different than the following, which is basically the idea suggested in the original answer:

# model 1
s1 = pm.HalfNormal('error', sd = 1)
weights = pm.Data("weights", [100,100,1,1,100,1,100,1,1,1,100])
scaled_s = s1/weights
obs = pm.Normal('observation',
                mu = y_pred,
                sd=scaled_s,
                observed=y)

Here, the weights vary across observations. The sampled values of s1 will try to balance the variance in the observed data (y), but must accommodate the fact that some weights are large and some are small.

Does that help?

1 Like

Thanks @cluhmann. If I understood your example above correctly, we would use weight of 100 for
the more interesting/important observed data like y_{exp} and the smaller weight of 1 for y_{sim}? The larger weight decreases scaled_s for the interesting/important observed data like y_{exp}, allowing these observed data to have more influence on the posterior. And the smaller weight increases scaled_s for y_{sim}. The s1 parameter may be somewhere between very large and small value to balance the overall variance of \boldsymbol{y}. Is my understanding here correct?

1 Like

Exactly!

1 Like