Parameterize a multi-dimensional pymc.Interpolated distribution

yang92 · May 4, 2023, 4:17pm

Hi, I’m trying to calibrate a PyMC Bayesian model with new data. Basically I would like to update the prior with the posterior distributions of the pre-trained model. I followed a similar workflow of this work: Updating priors — PyMC3 3.11.5 documentation. I understand general idea is to find the gaussian kernel density estimation of the posterior samples and rebuild corresponding priors with pymc.Interpolated distribution.
The only difference between my work and the demo notebook is that some of the coefficients in my model are multi-dimensional. I parameterized the argument shape and pass an array to distribution parameters, such as mu and sigma, to build a multi-dimensional Normal distribution object. See example codes below:

import pymc3 as pm
import numpy as np
import theano.tensor as tt

model = pm.Model()
with model:
    ...
    beta = pm.Normal(
        name="beta", 
        mu=tt._shared(np.array([1, 2, 3])), 
        sigma=tt._shared(np.array([1, 2, 3])),
        shape=3
    )
   ...
   trace = pm.sample(draws=2000, tune=2000, ...)

Then I modified the from_posterior function with np.apply_along_axis to generate a batch of x_points and pdf_points.So that I can similarly pass arrays of shape (3, 1000) to argument x_points and pdf_points, where the slice along axis 1 parameterize each dimension of the distribution. See code examples below:

import pymc3 as pm
from scipy import stats
import numpy as np

# I modified the function a little bit because my sampled posteriors are multi-dimensional
def from_posterior(tag, samples, bins=1000):
    if len(samples.shape) == 1:
        samples = np.expand_dims(samples, axis=1)
    smin, smax = np.min(samples, axis=0), np.max(samples, axis=0)
    width = smax - smin
    x = np.linspace(smin, smax, bins)
    y= np.apply_along_axis(func1d=lambda a: stats.gaussian_kde(a[:-bins])(a[-bins:]),
                           axis=0,
                           arr=np.concatenate((samples, x), axis=0))

    x_points = np.concatenate((x[[0], :] - 3 * width, x, x[[-1], :] + 3 * width), axis=0).T 
    # x_points.shape = (samples.shape[1], bins+2)
    pdf_points = np.concatenate([np.zeros((1, y.shape[1])), y, np.zeros((1, y.shape[1]))], axis=0).T
    # pdf_points.shape = (samples.shape[1], bins+2)
    return pm.Interpolated(tag, x_points=x_points,  pdf_points=pdf_points, shape=samples.shape[1])

with model:
    ...
    beta = from_posterior(
        tag="beta", 
        samples=beta_posteriors, # beta_posteriors.shape = (8000, 3)
        bins=1000
    )
   ...

However, the codes above returned this error:

ValueError: too many axes: 2 (effrank=2), expected rank=1

It seems the class pymc.Interpolated doesn’t function similarly as pm.Normal. The source codes called scipy.interpolate which doesn’t support the syntax above. So I just wonder if anyone encountered a similar problem and knows a way to code the dimension size of pm.Interpolated dynamically? Thank you very much!

cluhmann · May 4, 2023, 4:36pm

I think the current suggestion is to try the histrogram_approximation distribution found in pymc-experimental (here) rather than pm.Interpolated. Not sure that helps with the multivariate case, but figured I would mention it. Also, if you can update pymc (5.3 is the current release) that would probably be ideal.

Keith_Min · July 23, 2024, 5:01pm

@yang92 Were you able to find a way to do this?

Topic		Replies	Views
Incremental updates with independent multi-dimensional parameters Questions	0	469	October 12, 2020
Updating Priors with Posterior for Vector-Valued Distributions Questions	4	1739	December 6, 2018
Drawing from posterior of a Multivariate Distribution Questions	10	1559	December 22, 2018
Combining posterior distributions Questions	0	778	July 17, 2020
Updating priors for Gaussian Mixture Model Questions	5	1428	May 26, 2022

Parameterize a multi-dimensional pymc.Interpolated distribution

Related topics