I ran into an issue where for a censored model, ppc looks quite good but loo pit looks quite bad (tilted integral transform as if there is a bias). After playing around a bit I was able to strip it down to a toy model where similar behaviour happens. It is basically a linear regression with the following differences:
1- It is censored, upper and lower (possibly variable to some extend and not constant for all data)
2- Each observation line has a different intercept but each have the same slope.
The model is below:
import pymc as pm
import numpy as np
import arviz as az
import matplotlib.pyplot as plt
def _broadcast_border(border, data):
if isinstance(border, (float,int)):
if data is None:
return np.array([border])
else:
return np.ones(data.shape)*border
elif border.ndim==1:
border = border[:,None]
return border
def get_model(x, noise_mu, noise_sd, a_mu, a_sd, b_mu, b_sd, nlines,
lower, upper, observed=None, centre_data=False, rng=None):
if rng is None:
rng = np.random.default_rng(0)
lower = _broadcast_border(lower, observed)
upper = _broadcast_border(upper, observed)
if centre_data and observed is not None:
observed = observed.copy()
lower = lower.copy()
upper = upper.copy()
x = x.copy()
lower -= observed.mean(axis=-1)[:,None]
upper -= observed.mean(axis=-1)[:,None]
observed -= observed.mean(axis=-1)[:,None]
x -= x.mean()
with pm.Model() as model:
x = pm.Data("x", x)
noise = pm.InverseGamma("noise", mu=noise_mu, sigma=noise_sd)
a = pm.Normal("a", a_mu, a_sd)
b = pm.Normal("b", b_mu, b_sd, size=nlines)
line_mean = (a*x)[None,:] + b[:,None]
dist = pm.Normal.dist(line_mean, noise)
pm.Censored("obs", dist, lower=lower, upper=upper,
observed=observed)
if observed is None:
nsamples = 1000
with model:
idata=pm.sample_prior_predictive(nsamples, random_seed=rng)
obs = idata["prior"]["obs"].to_numpy()[0, rng.integers(0, nsamples),:,:]
return obs
return model
So I generated and fitted my data using the following settings:
seed = 0
rng = np.random.default_rng(seed)
nlines = 100
lower = rng.integers(-1,3, size=nlines).astype(float)
upper = rng.integers(9,12, size=nlines).astype(float)
x = np.linspace(0, 10, num=10)
fake_data=\
get_model(x, noise_mu=1, noise_sd=0.1, a_mu=-1, a_sd=0.1,
b_mu = 10, b_sd=2, nlines=nlines, lower=lower, upper=upper)
#fit to data
model=\
get_model(x, noise_mu=1, noise_sd=0.5, a_mu=-1, a_sd=0.5,
b_mu = 10, b_sd=5, nlines=nlines, lower=lower, upper=upper,
observed=fake_data, centre_data=False)
with model:
idata = pm.sample(draws=1000, tune=1000, chains=4, cores=4,
random_seed=rng)
pm.sample_posterior_predictive(idata, extend_inferencedata=True,
random_seed=rng)
pm.compute_log_likelihood(idata, extend_inferencedata=True)
The ppc and loo pit I get from this is
ppc looks relatively good but loo-pit is horrible (parameter estimates correct). In this case there are couple bad pareto values.
Of course the sane thing to do might be centring the data with respect to intercepts first, but it does not help (that option is available in get_model):
It, I think, improves the ppc and also now there are no observables with bad pareto values yet still horrible loo pit (parameter estimates again correct). loo pit looks a bit like what you would get from a biased model (a tilted noisy line) although not completely.
lo(o) and behold if when fitting the data I set lower and upper as -infinity, +infinity (and also do centring), I get the following:
Bad ppc (missing the accumulation of data at the borders), much better looking loo-pit with only perhaps a bit of bias (and no bad pareto values again). As expected slope estimate is off.
So is there something inherently wrong in applying loo-pit to models with censoring? I roughly know how it works (weighted mean of probability integral transform of observable with respect to samples where weight are log p of each sample) but cant really see why loo pit should look so bad given the ppc. Perhaps I can try to look at integral transforms one by one and see if there is an outlier with high weight but I don’t know seems like that probably wouldn’t be the answer.
ps: I have tried a case where lower and upper are the same for all data (0 and 10 respectively), still has the same problem.