Performance speedup for updating posterior with new data

Hi All

I’ve been developing a Poisson Regression model with the capabilities to incrementally update the posterior as new data becomes available (online learning).

I’ve been using @davidbrochart Updating Priors notebook as a guide for how to do this in pymc3. I’ve attached an example to this post.

The problem I’m having is that although the initial sampling takes 1 to 2 seconds (on 100 data points) - when new data is fed in this jumps to about 15 seconds per new data point (I’m feeding in just 1 new point for each update of the posterior). I’ve run this on a more powerful virtual machine (non-gpu) and the minimum I’ve got it down to is 6-10 seconds.

I’m wondering is there something I’m missing that could help quicken the updates? It seems like a lot of effort is spent initialising a new model every time - is there a way to by-pass this given that we’ve just run a similar model?

Super appreciate any help on this.

Thanks.
poisson_regression_online_learning.py (3.7 KB)

1 Like

I had a look at your code. Unfortunately, I dont see much more room to improve the speed, as the Interpolation rv always need to reinitialized, and the bottom neck in the evaluation of scipy Spline smooth function is also difficult to improve without rewriting everything in theano.

Alternatively, if you are happy with using some other distribution as an approximation, it would simplify the code a lot:

new_data = pd.DataFrame(data=[generate_data() for _ in range(1)], columns=generate_data().keys())
output_shared = theano.shared(new_data['output'].values)
x1_shared = theano.shared(new_data['x1'].values)
    
mu0 = theano.shared(trace['alpha_0'].mean(), name='hyper_mu0')
sd0 = theano.shared(trace['alpha_0'].std(), name='hyper_sd0')
mu1 = theano.shared(trace['alpha_1'].mean(), name='hyper_mu1')
sd1 = theano.shared(trace['alpha_1'].std(), name='hyper_sd1')

with pm.Model() as update_model:
    alpha_0 = pm.StudentT('alpha_0', mu=mu0, sd=sd0, nu=1, testval=0.)
    alpha_1 = pm.StudentT('alpha_1', mu=mu1, sd=sd1, nu=1, testval=0.)
    theta = (alpha_0 + alpha_1 * x1_shared)

    likelihood = pm.Poisson(
        'output',
        mu=np.exp(theta),
        observed=output_shared,
    )
    trace = pm.sample(1000)
    traces.append(trace)
    
for _ in range(1, 10):
    new_data = pd.DataFrame(data=[generate_data() for _ in range(1)], columns=generate_data().keys())
    output_shared.set_value(new_data['output'].values)
    x1_shared.set_value(new_data['x1'].values)
    mu0.set_value(trace['alpha_0'].mean())
    sd0.set_value(trace['alpha_0'].std())
    mu1.set_value(trace['alpha_1'].mean())
    sd1.set_value(trace['alpha_1'].std())
    
    with update_model:
        trace = pm.sample(1000)
        traces.append(trace)
1 Like

Thanks for looking at the code. To be honest your code changes has dropeed runtime per update from 15s down to a few seconds - so pretty awesome speedup in any case! Shame about the other bottlenecks though - figured not much can be done if it comes from elsewhere.

In the original code, the priors of alpha_0 and alpha_1 are given by normal distributions. When 9f0sdau90 updates the priors, s/he fits a gaussian kernel to the posterior samples. Given that, I’m just curious why you chose to use a student t distribution for the updated priors instead of a normal distribution, like (eg) this:
alpha_0 = pm.Normal(‘alpha_0’, mu=mu0, sd=sd0)
alpha_1 = pm.Normal(‘alpha_1’, mu=mu1, sd=sd1)

Using student t is more robust to outliners, as it has a heavier tail.

Is there a way of doing such update for vectorised Normal RV?

Here is code example for what i’d like to do :slight_smile:

DRAWS = 1000
TUNE = 100

Xw1 = np.random.normal(0,25,1000)
Xw2 = np.random.normal(0,25,1000)
Xw3 = np.random.normal(0,25,1000)
n, p = 1, .5
a = 0.5
b = np.array([0.8,-0.25,0.1])
y = 1*(logistic(a+b[0]*Xw1+b[1]*Xw2+b[2]*Xw3)>0.5)

data = pd.DataFrame({
    "Xw1" : Xw1,
    "Xw2" : Xw2,
    "Xw3" : Xw3,
    "y" : y
})

labels = ["Xw1","Xw2","Xw3"]

X = data[labels].to_numpy()
y = data["y"].to_numpy()

with pm.Model() as logistic_model:
    alpha = pm.Normal("alpha",mu=0.0,sigma=10)
    betas = pm.Normal("betas", mu=0.0, sigma=10, shape=X.shape[1]) #<--- for the next training/inference i'd like set posteriors from first data points as priors to new data points 

    # set predictors as shared variable to change them for PPCs:
    predictors = pm.Data("predictors", X)
    p = pm.Deterministic("p", pm.math.invlogit(alpha + predictors@betas))

    outcome = pm.Bernoulli("outcome", p=p, observed=y)

    trace = pm.sample(draws=DRAWS,
                      tune=TUNE, 
                      cores=2,
                      return_inferencedata=True) # draw posterior samples using NUTS sampling

I’m not a fan of naming beta coefficients one by one, makes models hard to change, right?

You can try modeling betas as:

betas = pm.Normal("betas", mu=np.zeros(X.shape[1]), sigma=np.ones(X.shape[1]) * 10, shape=X.shape[1])

and replace mu sigma with value computed from posterior sample for the next batch.

1 Like

Thank you, it should work.

Another quick question (maybe off-topic), is pm.Normal initialised this way (with shape>1) equal to Multivariate normal with only main diagonal in pymc3?

Yes.

1 Like