Hi, I’m trying to calibrate a PyMC Bayesian model with new data. Basically I would like to update the prior with the posterior distributions of the pre-trained model. I followed a similar workflow of this work: Updating priors — PyMC3 3.11.5 documentation. I understand general idea is to find the gaussian kernel density estimation of the posterior samples and rebuild corresponding priors with pymc.Interpolated
distribution.
The only difference between my work and the demo notebook is that some of the coefficients in my model are multi-dimensional. I parameterized the argument shape
and pass an array to distribution parameters, such as mu
and sigma
, to build a multi-dimensional Normal
distribution object. See example codes below:
import pymc3 as pm
import numpy as np
import theano.tensor as tt
model = pm.Model()
with model:
...
beta = pm.Normal(
name="beta",
mu=tt._shared(np.array([1, 2, 3])),
sigma=tt._shared(np.array([1, 2, 3])),
shape=3
)
...
trace = pm.sample(draws=2000, tune=2000, ...)
Then I modified the from_posterior
function with np.apply_along_axis
to generate a batch of x_points
and pdf_points
.So that I can similarly pass arrays of shape (3, 1000)
to argument x_points
and pdf_points
, where the slice along axis 1 parameterize each dimension of the distribution. See code examples below:
import pymc3 as pm
from scipy import stats
import numpy as np
# I modified the function a little bit because my sampled posteriors are multi-dimensional
def from_posterior(tag, samples, bins=1000):
if len(samples.shape) == 1:
samples = np.expand_dims(samples, axis=1)
smin, smax = np.min(samples, axis=0), np.max(samples, axis=0)
width = smax - smin
x = np.linspace(smin, smax, bins)
y= np.apply_along_axis(func1d=lambda a: stats.gaussian_kde(a[:-bins])(a[-bins:]),
axis=0,
arr=np.concatenate((samples, x), axis=0))
x_points = np.concatenate((x[[0], :] - 3 * width, x, x[[-1], :] + 3 * width), axis=0).T
# x_points.shape = (samples.shape[1], bins+2)
pdf_points = np.concatenate([np.zeros((1, y.shape[1])), y, np.zeros((1, y.shape[1]))], axis=0).T
# pdf_points.shape = (samples.shape[1], bins+2)
return pm.Interpolated(tag, x_points=x_points, pdf_points=pdf_points, shape=samples.shape[1])
with model:
...
beta = from_posterior(
tag="beta",
samples=beta_posteriors, # beta_posteriors.shape = (8000, 3)
bins=1000
)
...
However, the codes above returned this error:
ValueError: too many axes: 2 (effrank=2), expected rank=1
It seems the class pymc.Interpolated
doesn’t function similarly as pm.Normal
. The source codes called scipy.interpolate
which doesn’t support the syntax above. So I just wonder if anyone encountered a similar problem and knows a way to code the dimension size of pm.Interpolated
dynamically? Thank you very much!