Adding weights to GP Model

I am trying to model bid impression response and uncertainty, accounting for heteroskedastic noise, sparsity, and time decay. PyMC Gaussian Process modeling seems to be a good fit and I have a model that I like, but it does not incorporate the time-decay weights. Older data is less reliable, but I would like to include it as well. Looking for advise on how to incorporate weights for each data point while still modeling heteroskedastic noise.

import pymc as pm
import numpy as np
import matplotlib.pyplot as plt
from pymc.gp.util import plot_gp_dist
import pytensor.tensor as pt

np.random.seed(42)

# Generating sample data

# Generating the first half of the data
bid_first_half = np.random.uniform(10, 18, size=50)
impressions_first_half = bid_first_half * 100 + np.random.normal(loc=0, scale=300, size=50)

# Generating the second half of the data
bid_second_half = np.random.uniform(22, 30, size=50)
impressions_second_half = bid_second_half * 100 + np.random.normal(loc=0, scale=600, size=50)

# Joining the data
bid = np.concatenate([bid_first_half, bid_second_half])
impressions = np.concatenate([impressions_first_half, impressions_second_half])

bid_ = bid[:, None]  # GP inputs need to be column vectors

# Weights are in the range [0, 1] with more recent data having higher weights
weights = np.linspace(0.1, 1, num=100)

# Normalize weights to make sure they sum up to 1
weights /= weights.sum()

# Define the noise model
with pm.Model() as model:
    # Prior for the noise model parameters
    noise_ls = pm.Gamma("noise_ls", alpha=2, beta=1)
    noise_sv = pm.HalfNormal("noise_sv", sigma=1)

    # Noise model
    noise_cov_func = noise_sv**2 * pm.gp.cov.Matern52(input_dim=1, ls=noise_ls)
    noise_gp = pm.gp.Latent(cov_func=noise_cov_func)
    f_noise = noise_gp.prior("f_noise", X=bid_)

    # Now plug the noise model into the main model
    # Same priors and covariance function as before
    length_scale = pm.Gamma("length_scale", alpha=1, beta=2)
    signal_variance = pm.HalfNormal("signal_variance", sigma=np.std(impressions))

    cov_func = signal_variance**2 * pm.gp.cov.Matern52(input_dim=1, ls=length_scale)

    # The Gaussian Process
    gp = pm.gp.Marginal(cov_func=cov_func)

    # Observing the data with heteroskedastic noise
    y_ = gp.marginal_likelihood("y", X=bid_, y=impressions, sigma=pt.exp(f_noise))

    trace = pm.sample(2000, tune=1000, chains=2, target_accept=0.9, nuts_sampler="numpyro")


# Checking the results
#pm.plot_trace(trace)
#plt.show()


# New values from x=4 to x=15
bid_new = np.linspace(10, 30, 100)[:, None]

# add the GP conditional to the model, given the new X values
with model:
    f_pred = gp.conditional("f_pred", bid_new)

# sample from the gp posterior conditioned on the observed data
with model:
    pred_samples = pm.sample_posterior_predictive(trace, var_names=["f_pred"])

# convert "f_pred" to a numpy array
pred_samples_array = pred_samples["posterior_predictive"]["f_pred"].values

# flatten along the chains dimension
pred_samples_flat = pred_samples_array.reshape(pred_samples_array.shape[0]*pred_samples_array.shape[1], -1)

# plot the results
fig = plt.figure(figsize=(12,5)); ax = fig.gca()

# plot the samples from the GP posterior with samples and shading
plot_gp_dist(ax, pred_samples_flat, bid_new)

# plot the data and the true latent function
plt.plot(bid, impressions, 'ok', ms=3, alpha=0.5, label="Observed data");

# axis labels and title
plt.xlabel("Bid"); plt.ylabel("Impressions");
plt.title("Posterior distribution over $f(x)$ at the observed values"); plt.legend();

plt.show()

# compute mean and standard deviation
mu = pred_samples_array.mean(axis=(0, 1))
sd = pred_samples_array.std(axis=(0, 1))

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12,8), gridspec_kw={'height_ratios': [3, 1]})

# plot the mean line
ax1.plot(bid_new, mu, 'r', lw=2, label="mean")
ax1.fill_between(bid_new.flatten(), mu - sd, mu + sd, color="r", alpha=0.5, label="uncertainty")
ax1.plot(bid, impressions, 'ok', ms=3, alpha=0.5, label="Observed data")

# plot the uncertainty (standard deviation) as a bar chart
ax2.bar(bid_new.flatten(), sd, color='blue', width=0.1, align='center', label='uncertainty')
ax2.set_ylim(bottom=0)

# axis labels and title
ax1.set_xlabel("Bid"); ax1.set_ylabel("Impressions");
ax1.set_title("Mean and uncertainty of the GP prediction"); ax1.legend();

ax2.set_xlabel("Bid"); ax2.set_ylabel("Uncertainty");

plt.tight_layout()
plt.show()

No responses… I’ve even been trying to hire some help, but it seems that adding weights is non-trivial, so I have a new idea. Instead of using weights, I can take an online learning approach. A smaller subset of data will be used, a window of one or two weeks, to generate the model. Then each day I can generate a new model and use the old model as a prior distribution. This should allow the model to retain old knowledge (especially important for seldom explored areas of the bid space), while updating it with new information as time goes by. As an added bonus the daily computational expense would be less, as I would not be modeling the full set of historical data every day. Is this a reasonable approach, has any one done something similar?

Why not include time as another predictor? This is effectively the same as weighting by time except the GP learns the weights from the data.

Thanks for taking the time to read and reply, I really appreciate it. Sometimes the simple solutions can be elusive, especially for a beginner like me. I’m going to try this approach. I don’t know if you offer consultation services, and I don’t have much to spend, but this project is very important to me. I could offer $100 for a code review after I finish it, let me know if that’s something you would be able to help me with or if you have any contacts who would. Anyway, thanks again for pointing me in the right direction.

Modeling time as a dimension worked wonderfully. Thanks again. Now I just need to figure out how to run thousands of these predictions every morning :slight_smile:

1 Like

Nice! glad to hear that