Pm.sample_posterior_predictive() not working with weights

mattiasthalen · August 26, 2020, 11:42am

Hi,

I’m trying to consider weights in my model using pm.Potential(). It works good until I try to run pm.sample_posterior_predictive() on it. It starts and finishes within 1 sec and is just empty.

If I add weights, won’t I be able to sample the posterior predictive?

with pm.Model() as model:
    # Data
    x_obs_shared = pm.Data('x_obs_shared', x_obs)
    group_id_shared = pm.Data('group_id_shared', group_id)

    # Hyperpriors for group nodes
    alpha_mu = pm.Gamma('alpha_mu', mu = 100, sigma = 50)
    alpha_sigma = pm.HalfCauchy('alpha_sigma', beta = 5.0)

    beta_mu = pm.Gamma('beta_mu', mu = 100, sigma = 50)
    beta_sigma = pm.HalfCauchy('beta_sigma', beta =  5.0)

    # Priors
    alpha = pm.Gamma('alpha', mu = alpha_mu, sigma = alpha_sigma, shape = n_groups)

    beta = pm.Gamma('beta', mu = beta_mu, sigma = beta_sigma, shape = n_groups)
    beta_negative = pm.Deterministic('beta_negative', -beta)

    sigma = pm.HalfCauchy('sigma', beta =  5.0)
    nu = pm.InverseGamma('nu', alpha = 1, beta = 1)
    
    # Expected values
    y_est = alpha[group_id_shared] + beta_negative[group_id_shared]*x_obs_shared

    # Data likelihoods
    logp = weights * pm.StudentT.dist(nu = nu, mu = y_est, sigma = sigma).logp(y_obs)
    y_like = pm.Potential('y_like', logp)

AlexAndorra · August 27, 2020, 8:31am

Ah yeah, unfortunately forward sampling functions (i.e sample_prior_predictive and sample_posterior_predictive) don’t take into account Potentials yet
All is not lost though, as you can do it by hand. I’m thinking of something like:

ppc =  weights * pm.StudentT.dist(nu = trace["nu"], mu = trace["y_est"], sigma = trace["sigma"]).random(size=1000)

@junpenglao am I not too wrong here? Is there a better to do it?

junpenglao · August 27, 2020, 9:21am

I dont think that’s correct - imagine if weights is [1., 0., 0., …], you would expect only getting samples from StudentT(nu=nu[0], mu=y_est[0], sigma=sigma[0]), but returning weight * random_ppc_samples will give you a whole bunch of 0s.

Unfortunately, I dont think there is a principle way to do this other than encoding the weight into your prior. Usually it is done by prior in Sigma (i.e., larger weight -> more information in the observation -> small sigma). There is also a similar discussion in Stan forum: https://discourse.mc-stan.org/t/bayesian-parallels-of-weighted-regression/16152/7

mattiasthalen · August 27, 2020, 12:17pm

From what I got there it seems like one way could be to include the weights as a second predictor.
I.e. going from y = a + b0*x to y = a + b0*x + b1*w

The caveat here that I see here is that it might mean that I need to include weights when predicting new data, but that doesn’t bother me that much

But to your suggestion, using it as a prior for sigma, would I define it like sigma = pm.HalfCauchy('sigma', beta = 1 - w_obs)?

junpenglao · August 28, 2020, 7:33am

Yeah something like that should work - y = a + b0*x + b1*w also makes sense to me if you consider linear regression as “explain away the variance”.

AlexAndorra · August 28, 2020, 1:16pm

Ah yeah, good point. I always learn something when Junpeng is around

nkaimcaudle · September 2, 2020, 4:01am

Shouldn’t the weight go into the likelihood rather than prior?

Something like:

sigma = pm.HalfCauchy('sigma', beta = 5.0)
...
logp = weights * pm.StudentT.dist(nu = nu, mu = y_est, sigma = sigma*(1-w_obs)).logp(y_obs)

assuming w_obs is strictly less than 1

mattiasthalen · September 2, 2020, 8:33pm

For adding weights as part of sigma, I’m afraid that it didn’t work like that. Don’t think I should add the weights via observed either. At least it makes the plot_ppc look terrible

Currently, my full model looks like this:

with pm.Model() as linear_model:
    # Shared Data
    load_obs_shared = pm.Data('load_obs_shared', load_obs)
    velocity_obs_shared = pm.Data('velocity_obs_shared', vel_obs)
    exercise_id_shared = pm.Data('exercise_id_shared', exercise_id)

    # Hyperpriors
    alpha_mu = pm.Gamma('alpha_mu', mu = 120, sigma = 5)
    alpha_sigma = pm.HalfCauchy('alpha_sigma', beta = 1)
    beta_mu = pm.Gamma('beta_mu', mu = 80, sigma = 5)
    beta_sigma = pm.HalfCauchy('beta_sigma', beta = 1)

    # General Priors
    alpha = pm.Gamma('alpha', mu = alpha_mu, sigma = alpha_sigma, shape = n_exercises)
    beta = pm.Gamma('beta', mu = beta_mu, sigma = beta_sigma, shape = n_exercises)
    beta_negative = pm.Deterministic('beta_negative', -beta)

    # Normal Model (y ~ x)
    load_sigma = pm.HalfCauchy('load_sigma', beta = 1)
    load_nu = pm.InverseGamma('load_nu', alpha = 1, beta = 1)
    load_est = alpha[exercise_id_shared] + beta_negative[exercise_id_shared]*velocity_obs_shared
    load_like = pm.StudentT('load_like', nu = load_nu, mu = load_est, sigma = load_sigma, observed = load_obs_shared)

    # Inverse Model (x ~ y)
    velocity_sigma = pm.HalfCauchy('velocity_sigma', beta = 1)
    velocity_nu = pm.InverseGamma('velocity_nu', alpha = 1, beta = 1)
    velocity_est = (load_obs_shared - alpha[exercise_id_shared])/beta_negative[exercise_id_shared]
    velocity_like = pm.StudentT('velocity_like', nu = velocity_nu, mu = velocity_est, sigma = velocity_sigma, observed = velocity_obs_shared)

And I’m looking to include the weights here:

# Normal Model (y ~ x)
load_sigma = pm.HalfCauchy('load_sigma', beta = 1)
...
# Inverse Model (x ~ y)
velocity_sigma = pm.HalfCauchy('velocity_sigma', beta = 1)

junpenglao · September 3, 2020, 5:41am

Modelling y~x and x~y in the same model doesnt make much sense to me - you are essentially double dipping and use the data twice and also making the model none generative (there is a circular dependency in your graph)

mattiasthalen · September 3, 2020, 6:10am

A bit off topic, but would it be better to handle them like two separately models?

On topic, could you provide an example how to include the weight into the sigma priors?

Thank you all for your help, this is an awesome community!

junpenglao · September 4, 2020, 7:06am

yep

Taking the example from statsmodels, it goes something like this:

import numpy as np
import statsmodels.api as sm

Y = np.asarray([1,3,4,5,2,3,4]) + np.arange(1,8) - 5
x = np.arange(1,8)
weights = np.arange(1,8)
X = sm.add_constant(x)
wls_model = sm.WLS(Y, X, weights)
results = wls_model.fit()
results.params
# ==> array([-2.08333333,  1.0952381 ])

import pymc3 as pm
with pm.Model() as m:
    intercept = pm.Normal('c', 0., 10.)
    beta = pm.Normal('b', 0., 5.)
    # Total variance
    sigma = pm.HalfNormal('s', 2.)
    # Scale by weights
    # The weights are presumed to be (proportional to) the 
    # inverse of the variance of the observations.
    tau = 1 / sigma**2 * (weights / weights.sum())
    pm.Normal('y', beta*x + intercept, tau=tau, observed=Y)
    trace = pm.sample(return_inferencedata=True)

pm.summary(trace)['mean']
# c   -1.971
# b    1.076
# s    0.618
# Name: mean, dtype: float64

There is a bit of divergence so the prior needs some more care, but that’s the general idea.

junpenglao · September 4, 2020, 7:11am

Even closer to WLS would be:

with pm.Model() as m2:
    intercept = pm.Normal('c', 0., 10.)
    beta = pm.Normal('b', 0., 5.)
    pm.Normal('y', beta*x + intercept, tau=weights, observed=Y)
    map_result = pm.find_MAP()
map_result
# ==> {'c': array(-2.07381709), 'b': array(1.09348298)}

omrihar · March 23, 2023, 12:06pm

I’m trying to achieve a similar thing in the context of a Gamma GLM.
I’ve followed this topic to create a potential which I can weight.

I’ve had an idea about how to sample from the posterior predictive but I’m not convinced it’s correct: create a second model, replacing the potential with the Gamma function and use that only to sample from the posterior predictive.

For example, if this is the model I’m using for sampling:

# x = regressors
# y = target
# w = weights

with pm.Model() as m1:
    α = pm.Normal("α", 0, 1)
    β = pm.Normal("β", 0, 1)
    shape = pm.Uniform("shape", 0, 100)

    # Log-link function
    μ = at.exp(α + β*x)

    logp = w * pm.logp(pm.Gamma.dist(alpha=shape, beta=shape/μ), y)
    potential = pm.Potential("potential", logp)

    trace = pm.sample()

and then use this one for prediction:

with pm.Model() as m2:
    α = pm.Normal("α", 0, 1)
    β = pm.Normal("β", 0, 1)
    shape = pm.Uniform("shape", 0, 100)

    # Log-link function
    μ = at.exp(α + β*x)

    pm.Gamma(alpha=shape, beta=shape/μ, observed=y)

with m2:
    trace.extend(pm.sample_posterior_predictive(trace))

Would that be a correct way? I’m aware of it that the weights don’t come into account in the posterior predictive, but that makes sense - I’m giving more weight to certain observations in the fitting stage, but not in the prediction stage.

Topic		Replies	Views
Sample posterior check with pm.Potentials Questions modeling	4	562	March 17, 2023
Issues while using sample_posterior_predictive _w to compute probability or weights Questions	2	512	October 9, 2020
How to run logistic regression with weighted samples Questions linear_model	10	3369	August 30, 2021
Sample_posterior_predictive_w v5 development	4	531	December 10, 2022
How to model/deal with weighted binary outcomes Questions	11	2932	August 30, 2021

Pm.sample_posterior_predictive() not working with weights

Related topics