Hi all,
I am trying to fit a B-splines model to some longitudinal data. I defined the model like this (using a skewed normal instead of a regular normal distribution as the likelihood, see https://www.pymc.io/projects/docs/en/latest/api/distributions/generated/pymc.SkewNormal.html).
COORDS = {"splines": np.arange(numKnots+2)}
with pm.Model(coords=COORDS) as model:
x = pm.MutableData('x', value=data.x)
B = dmatrix(
"bs(x, knots=knots, degree=3, include_intercept=True) - 1",
{"x": x.value, "knots": iKnots},
)
a = pm.Normal("a", 0, 3)
w = pm.Normal("w", mu=0, sigma=2, dims="splines")
mu = pm.Deterministic("mu", a + pm.math.dot(np.asarray(B, order="F"), w.T))
sigma = pm.HalfStudentT("sigma", nu=4, sigma=1)
alpha = pm.Normal("alpha", mu=0, sigma=5)
# Compute the mean of a skewed normal distribution.
mean = pm.Deterministic("mean", mu + sigma * np.sqrt(2/np.pi) * (alpha/np.sqrt(1 + np.power(alpha, 2))))
y = pm.SkewNormal("y", mu=mu, sigma=sigma, alpha=alpha, observed=data.y,
shape=len(data.x))
I was wondering if it is possible to make out-of-sample posterior predictions with this model as I would like to compute the mean (see definition in the model specification) for values that lie in between those in data.x
.
I first tried simply using pm.set_data({'x': newData.x})
and then run pm.sample_posterior_predictive(iData, var_names=['mean'], random_seed = 6)
but this just produced predictions for the original x
data.
So I then tried the following:
with model:
x_new = list(range(430)) # these are just some random new data that lie within the limits of data.x
B = dmatrix(
"bs(x, knots=knots, degree=3, include_intercept=True) - 1",
{"x": x_new, "knots": iKnots},
)
mu_hat = pm.Deterministic("mu_hat", a + pm.math.dot(np.asarray(B, order="F"), w.T))
mean_hat = pm.Deterministic("mean_hat", mu_hat + sigma * np.sqrt(2/np.pi) * (alpha/np.sqrt(1 + np.power(alpha, 2))))
ppc = pm.sample_posterior_predictive(iData, var_names=["mu_hat", "mean_hat"], random_seed=6)
because I thought that B
and mu
need to be reevaluated when passing new data. However, this results in really big HDIs for mu_hat
and mean_hat
. So I’m wondering what I’m doing wrong here and what’s the proper way to do it (if possible).
Thanks in advance!