#Bayesian Linear Regression
with pm.Model() as predictive_model:
x = pm.Data("x",xtrain.values)
y = pm.Data("y",ytrain.values)
a = pm.Normal('slope', mu=0, sd = 16)
b = pm.Normal('intercept',mu= 0, sd = 16)
s = pm.HalfNormal('error',sd =1)
y_pred = a*x+b
obs = pm.Normal('observation', mu = y_pred, sd=s, observed=y)
start = pm.find_MAP()
step = pm.NUTS(scaling=start)
trace = pm.sample(1000, step, start=start, progressbar=True)

Now, I need to add weights associated with each data point, weights are based on recency of data(recent data is given more weights and this weight is already available to me as an array). how can I assign them?

There are various ways to incorporate weights into your model. Many will write custom logp functions (e.g., here and here). But I personally prefer to adjust the scale parameter of my observed variable, which allows for more heavily weighted observations to contribute more modelâ€™s logp. Itâ€™s simple and relatively transparent. Hereâ€™s a simple example:

s = pm.HalfNormal('error', sd = 1)
scaled_s = pm.Deterministic('scaled_s', s / y_weights)
obs = pm.Normal('observation',
mu = y_pred,
sd=scaled_s,
observed=y)

In addition, I would strongly suggest using the defaults when calling pm.sample() until you are confident that they donâ€™t work for you. I would particularly suggest avoiding the use of pm.find_MAP()

Yes, adding the weights integrates the data more deeply into the model and makes it harder to just swap out data sets. In these cases, I tend to use the samples in the trace to generate whatever I need â€śby handâ€ť. Something along these lines:

import matplotlib.pyplot as plt
n_posterior_samples = 100
for _ in range(n_posterior_samples):
sample = np.random.choice(trace)
mean_hat = sample['intercept'] + (sample['slope'] * new_x)
obs_hat = np.random.normal(loc=mean_hat,
scale=sample['error'] / new_weights)
plt.plot(new_x, obs_hat)
plt.show()

Hi, I was actually going through a similar problem, and in my case the accepted solution will be very inefficient since I need to also update the model in an online manner. Instead, I found a better way to do the same by specifying the weights in a data container, and during inference with new data, simply set the data container value with test dataset :

After training through this, during inference, when you set your test data, simple update the data container by

pm.set_data("weights", y_weights_test)

You should be able to run it now.
The problem with pm.Deterministic() call is that it makes each weight a parameter and therefore its size become fixed.

Let me know if you have any further questions, or any feedback for this trick.

Question with how much impact scaling a parameter has on the model.

If s is the learnable parameter and we set sd=scaled_s, then wouldnâ€™t s be skewed towards a smaller value anyway? If yes, then wouldnâ€™t that reduce the impact scaling has?

Sorry if I was vague about this. Let me give an example.

Letâ€™s say you have 100 response data \boldsymbol{y} that are from two different source. One from simulation y_{sim}, and another using real life experiments y_{exp}. The size of the response data from each source is \text{dim}(y_{sim}) = 95, and \text{dim}(y_{exp}) = 5. And you also want to lower weight (reduce the importance) of the simulation data, and increase the weight for the real experiments.

The standard deviation on the simulation data is very small, s_{sim} = 0.01, compared to the real experiments, s_{exp} = 0.30.

If we have two models. The first model doesnâ€™t weight the data, and the second model does. And letâ€™s say the weights for simulation data is 0.05, and real experiment data is 1. Below are code chunks below for the two models.

# model 1
s1 = pm.HalfNormal('error', sd = 1)
obs = pm.Normal('observation',
mu = y_pred,
sd=s1,
observed=y)

Wouldnâ€™t the s2 posterior be much much smaller than s1 because majority of the data in \boldsymbol{y} has a small standard deviation? If s2 posterior will be much smaller than s1, then scaled_s will be similar to s1. Therefore, scaling the data isnâ€™t that effective.

Yes, that all seems correct to me. The value of scaled_s is likely to be similar to s1on average. But the weighting scheme isnâ€™t intended to do anything on average, itâ€™s intended to weigh some observations more than others. So itâ€™s effect is relative, not absolute. Maybe it helps to reformulate your model 1 like this:

This is identical to your model 2, but with equal weights applied to all observations which is the weighting implied by your original model 1. The estimates of s1 should be identical to your original model 1.

Here, we have the same, uniform weighting scheme, but we associate every observation with a weight of 100, instead of the previous weight of 1. The posterior of s1 will likely suggest larger values in order to compensate for the extremely large weights we have applied. But things should mostly balance out.

But this is very different than the following, which is basically the idea suggested in the original answer:

Here, the weights vary across observations. The sampled values of s1 will try to balance the variance in the observed data (y), but must accommodate the fact that some weights are large and some are small.

Thanks @cluhmann. If I understood your example above correctly, we would use weight of 100 for
the more interesting/important observed data like y_{exp} and the smaller weight of 1 for y_{sim}? The larger weight decreases scaled_s for the interesting/important observed data like y_{exp}, allowing these observed data to have more influence on the posterior. And the smaller weight increases scaled_s for y_{sim}. The s1 parameter may be somewhere between very large and small value to balance the overall variance of \boldsymbol{y}. Is my understanding here correct?