Incremental updates with independent multi-dimensional parameters

Tanavya_Dimri · October 12, 2020, 9:29pm

I am trying to use PYMC3 for a Bayesian model where I would like to repeatedly train my model on new unseen data. I am thinking I would need to update the priors with the posterior of the previously trained model every time I see the data, similar to how is achieved here https://docs.pymc.io/notebooks/updating_priors.html. They use the following function that finds the KDE from the samples and replacing each of the original definitions of the parameters in the model with a call to from_posterior.

def from_posterior(param, samples):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, 100)
    y = stats.gaussian_kde(samples)(x)

    # what was never sampled should have a small probability but not 0,
    # so we'll extend the domain and use linear approximation of density on it
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y, [0]])
    return Interpolated(param, x, y)

And here is my original model.

def create_model(batsmen, bowlers, id1, id2, X):
    testval = [[-5,0,1,2,3.5,5] for i in range(0, 9)]
    l = [i for i in range(9)]
    model = pm.Model()
    with model:
        delta_1 = pm.Uniform("delta_1", lower=0, upper=1)
        delta_2 = pm.Uniform("delta_2", lower=0, upper=1)
        inv_sigma_sqr = pm.Gamma("sigma^-2", alpha=1.0, beta=1.0)
        inv_tau_sqr = pm.Gamma("tau^-2", alpha=1.0, beta=1.0)
        mu_1 = pm.Normal("mu_1", mu=0, sigma=1/pm.math.sqrt(inv_tau_sqr), shape=len(batsmen))
        mu_2 = pm.Normal("mu_2", mu=0, sigma=1/pm.math.sqrt(inv_tau_sqr), shape=len(bowlers))
        delta = pm.math.ge(l, 3) * delta_1 + pm.math.ge(l, 6) * delta_2
        eta = [pm.Deterministic("eta_" + str(i), delta[i] + mu_1[id1[i]] - mu_2[id2[i]]) for i in range(9)]
        cutpoints = pm.Normal("cutpoints", mu=0, sigma=1/pm.math.sqrt(inv_sigma_sqr), transform=pm.distributions.transforms.ordered, shape=(9,6), testval=testval)
        X_ = [pm.OrderedLogistic("X_" + str(i), cutpoints=cutpoints[i], eta=eta[i], observed=X[i]-1) for i in range(9)]
    return model

Here, the problem is that some of my parameters such as mu_1, are multidimensional but independent. This is why I get the following error:

ValueError: points have dimension 1, dataset has dimension 1500

because of the line y = stats.gaussian_kde(samples)(x).

I realised this is because the from_posterior function seems to work only with univariate data, so I am skipping vectorisation like follows:

mu_1_trace = trace["mu_1"].T
mu_2_trace = trace["mu_2"].T
cutpoints_trace = trace["cutpoints"].T
def update_model(batsmen, bowlers, id1, id2, X, trace):
    testval = [[-5,0,1,2,3.5,5] for i in range(0, 9)]
    l = [i for i in range(9)]
    model = pm.Model()
    with model:
        cutpoints = [np.asarray([from_posterior("cutpoints_%d_%d" % (i, j), cutpoints_trace[i][j]) for i in range(6)]) for j in range(9)]
        delta_1 = from_posterior("delta_1", trace["delta_1"])
        delta_2 = from_posterior("delta_2", trace["delta_2"])
        mu_1 = np.asarray([from_posterior("mu_1_%d" % i, mu_1_trace[i]) for i in range(len(batsmen))])
        mu_2 = np.asarray([from_posterior("mu_2_%d" % i, mu_2_trace[i]) for i in range(len(bowlers))])
        delta = pm.math.ge(l, 3) * delta_1 + pm.math.ge(l, 6) * delta_2
        eta = [delta[i] + mu_1[id1[i]] - mu_2[id2[i]] for i in range(9)]
        X_ = [pm.OrderedLogistic("X_" + str(i), cutpoints=cutpoints[i], eta=eta[i], observed=X[i]-1) for i in range(9)]
    return model

But this is also not working as it fails when I try to find the cutpoints. No matter what I try, I get some error like TypeError: copy() takes no arguments (1 given) or ValueError: setting an array element with a sequence.

Can someone please help me make this work for multi-dimensional parameters?
Thank you in advance!!

Topic		Replies	Views
Can traces be used as priors? Questions	5	2367	September 12, 2019
Parameterize a multi-dimensional pymc.Interpolated distribution version agnostic	2	815	July 23, 2024
How to get "on-line" prediction estimates for data over time? Questions	6	771	May 19, 2020
Best way to use posterior as a prior for another analysis? Questions	6	587	December 10, 2024
Drawing from posterior of a Multivariate Distribution Questions	10	1566	December 22, 2018

Incremental updates with independent multi-dimensional parameters

Related topics