Pm.sample_posterior_predictive() not working with weights

Hi,

I’m trying to consider weights in my model using pm.Potential(). It works good until I try to run pm.sample_posterior_predictive() on it. It starts and finishes within 1 sec and is just empty.

If I add weights, won’t I be able to sample the posterior predictive?

with pm.Model() as model:
    # Data
    x_obs_shared = pm.Data('x_obs_shared', x_obs)
    group_id_shared = pm.Data('group_id_shared', group_id)

    # Hyperpriors for group nodes
    alpha_mu = pm.Gamma('alpha_mu', mu = 100, sigma = 50)
    alpha_sigma = pm.HalfCauchy('alpha_sigma', beta = 5.0)

    beta_mu = pm.Gamma('beta_mu', mu = 100, sigma = 50)
    beta_sigma = pm.HalfCauchy('beta_sigma', beta =  5.0)

    # Priors
    alpha = pm.Gamma('alpha', mu = alpha_mu, sigma = alpha_sigma, shape = n_groups)

    beta = pm.Gamma('beta', mu = beta_mu, sigma = beta_sigma, shape = n_groups)
    beta_negative = pm.Deterministic('beta_negative', -beta)

    sigma = pm.HalfCauchy('sigma', beta =  5.0)
    nu = pm.InverseGamma('nu', alpha = 1, beta = 1)
    
    # Expected values
    y_est = alpha[group_id_shared] + beta_negative[group_id_shared]*x_obs_shared

    # Data likelihoods
    logp = weights * pm.StudentT.dist(nu = nu, mu = y_est, sigma = sigma).logp(y_obs)
    y_like = pm.Potential('y_like', logp)
1 Like

Ah yeah, unfortunately forward sampling functions (i.e sample_prior_predictive and sample_posterior_predictive) don’t take into account Potentials yet :confused:
All is not lost though, as you can do it by hand. I’m thinking of something like:

ppc =  weights * pm.StudentT.dist(nu = trace["nu"], mu = trace["y_est"], sigma = trace["sigma"]).random(size=1000)

@junpenglao am I not too wrong here? Is there a better to do it?

1 Like

I dont think that’s correct - imagine if weights is [1., 0., 0., …], you would expect only getting samples from StudentT(nu=nu[0], mu=y_est[0], sigma=sigma[0]), but returning weight * random_ppc_samples will give you a whole bunch of 0s.

Unfortunately, I dont think there is a principle way to do this other than encoding the weight into your prior. Usually it is done by prior in Sigma (i.e., larger weight -> more information in the observation -> small sigma). There is also a similar discussion in Stan forum: https://discourse.mc-stan.org/t/bayesian-parallels-of-weighted-regression/16152/7

4 Likes

From what I got there it seems like one way could be to include the weights as a second predictor.
I.e. going from y = a + b0*x to y = a + b0*x + b1*w

The caveat here that I see here is that it might mean that I need to include weights when predicting new data, but that doesn’t bother me that much :slight_smile:

But to your suggestion, using it as a prior for sigma, would I define it like sigma = pm.HalfCauchy('sigma', beta = 1 - w_obs)?

Yeah something like that should work - y = a + b0*x + b1*w also makes sense to me if you consider linear regression as “explain away the variance”.

1 Like

Ah yeah, good point. I always learn something when Junpeng is around :smiley:

1 Like

Shouldn’t the weight go into the likelihood rather than prior?

Something like:

sigma = pm.HalfCauchy('sigma', beta = 5.0)
...
logp = weights * pm.StudentT.dist(nu = nu, mu = y_est, sigma = sigma*(1-w_obs)).logp(y_obs)

assuming w_obs is strictly less than 1

For adding weights as part of sigma, I’m afraid that it didn’t work like that. Don’t think I should add the weights via observed either. At least it makes the plot_ppc look terrible :slightly_frowning_face:

Currently, my full model looks like this:

with pm.Model() as linear_model:
    # Shared Data
    load_obs_shared = pm.Data('load_obs_shared', load_obs)
    velocity_obs_shared = pm.Data('velocity_obs_shared', vel_obs)
    exercise_id_shared = pm.Data('exercise_id_shared', exercise_id)

    # Hyperpriors
    alpha_mu = pm.Gamma('alpha_mu', mu = 120, sigma = 5)
    alpha_sigma = pm.HalfCauchy('alpha_sigma', beta = 1)
    beta_mu = pm.Gamma('beta_mu', mu = 80, sigma = 5)
    beta_sigma = pm.HalfCauchy('beta_sigma', beta = 1)

    # General Priors
    alpha = pm.Gamma('alpha', mu = alpha_mu, sigma = alpha_sigma, shape = n_exercises)
    beta = pm.Gamma('beta', mu = beta_mu, sigma = beta_sigma, shape = n_exercises)
    beta_negative = pm.Deterministic('beta_negative', -beta)

    # Normal Model (y ~ x)
    load_sigma = pm.HalfCauchy('load_sigma', beta = 1)
    load_nu = pm.InverseGamma('load_nu', alpha = 1, beta = 1)
    load_est = alpha[exercise_id_shared] + beta_negative[exercise_id_shared]*velocity_obs_shared
    load_like = pm.StudentT('load_like', nu = load_nu, mu = load_est, sigma = load_sigma, observed = load_obs_shared)

    # Inverse Model (x ~ y)
    velocity_sigma = pm.HalfCauchy('velocity_sigma', beta = 1)
    velocity_nu = pm.InverseGamma('velocity_nu', alpha = 1, beta = 1)
    velocity_est = (load_obs_shared - alpha[exercise_id_shared])/beta_negative[exercise_id_shared]
    velocity_like = pm.StudentT('velocity_like', nu = velocity_nu, mu = velocity_est, sigma = velocity_sigma, observed = velocity_obs_shared)

And I’m looking to include the weights here:

# Normal Model (y ~ x)
load_sigma = pm.HalfCauchy('load_sigma', beta = 1)
...
# Inverse Model (x ~ y)
velocity_sigma = pm.HalfCauchy('velocity_sigma', beta = 1)

Modelling y~x and x~y in the same model doesnt make much sense to me - you are essentially double dipping and use the data twice and also making the model none generative (there is a circular dependency in your graph)

1 Like

A bit off topic, but would it be better to handle them like two separately models?

On topic, could you provide an example how to include the weight into the sigma priors?

Thank you all for your help, this is an awesome community! :slight_smile:

yep

Taking the example from statsmodels, it goes something like this:

import numpy as np
import statsmodels.api as sm

Y = np.asarray([1,3,4,5,2,3,4]) + np.arange(1,8) - 5
x = np.arange(1,8)
weights = np.arange(1,8)
X = sm.add_constant(x)
wls_model = sm.WLS(Y, X, weights)
results = wls_model.fit()
results.params
# ==> array([-2.08333333,  1.0952381 ])

import pymc3 as pm
with pm.Model() as m:
    intercept = pm.Normal('c', 0., 10.)
    beta = pm.Normal('b', 0., 5.)
    # Total variance
    sigma = pm.HalfNormal('s', 2.)
    # Scale by weights
    # The weights are presumed to be (proportional to) the 
    # inverse of the variance of the observations.
    tau = 1 / sigma**2 * (weights / weights.sum())
    pm.Normal('y', beta*x + intercept, tau=tau, observed=Y)
    trace = pm.sample(return_inferencedata=True)

pm.summary(trace)['mean']
# c   -1.971
# b    1.076
# s    0.618
# Name: mean, dtype: float64

There is a bit of divergence so the prior needs some more care, but that’s the general idea.

3 Likes

Even closer to WLS would be:

with pm.Model() as m2:
    intercept = pm.Normal('c', 0., 10.)
    beta = pm.Normal('b', 0., 5.)
    pm.Normal('y', beta*x + intercept, tau=weights, observed=Y)
    map_result = pm.find_MAP()
map_result
# ==> {'c': array(-2.07381709), 'b': array(1.09348298)}
1 Like

I’m trying to achieve a similar thing in the context of a Gamma GLM.
I’ve followed this topic to create a potential which I can weight.

I’ve had an idea about how to sample from the posterior predictive but I’m not convinced it’s correct: create a second model, replacing the potential with the Gamma function and use that only to sample from the posterior predictive.

For example, if this is the model I’m using for sampling:

# x = regressors
# y = target
# w = weights

with pm.Model() as m1:
    α = pm.Normal("α", 0, 1)
    β = pm.Normal("β", 0, 1)
    shape = pm.Uniform("shape", 0, 100)

    # Log-link function
    μ = at.exp(α + β*x)

    logp = w * pm.logp(pm.Gamma.dist(alpha=shape, beta=shape/μ), y)
    potential = pm.Potential("potential", logp)

    trace = pm.sample()

and then use this one for prediction:

with pm.Model() as m2:
    α = pm.Normal("α", 0, 1)
    β = pm.Normal("β", 0, 1)
    shape = pm.Uniform("shape", 0, 100)

    # Log-link function
    μ = at.exp(α + β*x)

    pm.Gamma(alpha=shape, beta=shape/μ, observed=y)

with m2:
    trace.extend(pm.sample_posterior_predictive(trace))

Would that be a correct way? I’m aware of it that the weights don’t come into account in the posterior predictive, but that makes sense - I’m giving more weight to certain observations in the fitting stage, but not in the prediction stage.

1 Like