Performance speedup for updating posterior with new data

junpenglao · April 18, 2018, 4:19pm

I had a look at your code. Unfortunately, I dont see much more room to improve the speed, as the Interpolation rv always need to reinitialized, and the bottom neck in the evaluation of scipy Spline smooth function is also difficult to improve without rewriting everything in theano.

Alternatively, if you are happy with using some other distribution as an approximation, it would simplify the code a lot:

new_data = pd.DataFrame(data=[generate_data() for _ in range(1)], columns=generate_data().keys())
output_shared = theano.shared(new_data['output'].values)
x1_shared = theano.shared(new_data['x1'].values)
    
mu0 = theano.shared(trace['alpha_0'].mean(), name='hyper_mu0')
sd0 = theano.shared(trace['alpha_0'].std(), name='hyper_sd0')
mu1 = theano.shared(trace['alpha_1'].mean(), name='hyper_mu1')
sd1 = theano.shared(trace['alpha_1'].std(), name='hyper_sd1')

with pm.Model() as update_model:
    alpha_0 = pm.StudentT('alpha_0', mu=mu0, sd=sd0, nu=1, testval=0.)
    alpha_1 = pm.StudentT('alpha_1', mu=mu1, sd=sd1, nu=1, testval=0.)
    theta = (alpha_0 + alpha_1 * x1_shared)

    likelihood = pm.Poisson(
        'output',
        mu=np.exp(theta),
        observed=output_shared,
    )
    trace = pm.sample(1000)
    traces.append(trace)
    
for _ in range(1, 10):
    new_data = pd.DataFrame(data=[generate_data() for _ in range(1)], columns=generate_data().keys())
    output_shared.set_value(new_data['output'].values)
    x1_shared.set_value(new_data['x1'].values)
    mu0.set_value(trace['alpha_0'].mean())
    sd0.set_value(trace['alpha_0'].std())
    mu1.set_value(trace['alpha_1'].mean())
    sd1.set_value(trace['alpha_1'].std())
    
    with update_model:
        trace = pm.sample(1000)
        traces.append(trace)

Topic		Replies	Views
Poisson regression model for beginner v5 theano , modeling	5	980	June 28, 2023
Simulating new data points while having data-dependent priors Questions	0	413	May 10, 2019
Shared theano in multiple regression Questions	12	1025	February 12, 2019
Sampling is very slow when using theano.scan in pymc3 v3	1	422	July 12, 2022
Implementing a more sophisticated pymc3 model v3 theano , modeling	8	649	June 11, 2022

Performance speedup for updating posterior with new data

Related Topics