It is possible to use the MeanField
and FullRank
approximations as priors for a new fit?
I’m thinking of a use case where a model is trained on some initial data, the posterior approximation is stored, and then the model is updated on new data at a later date. So structurally like mini batching the data, but where the batches are observed over time (and potentially even a fully online use case, but not necessarily).
Here is a working example learning a normal mean with 20 batches of size 15, surprisingly it works (even with really small number of passes per minibatch)! Each batch gets closer to the correct value, and the parameter uncertainty sensibly gets lower and lower, which is super.
import numpy as np
import pymc3 as pm
def simulate_data():
return np.random.normal(6.5, 1.5, 15)
def mean_model(data, prior=None):
"""Returns a model of the data mean given a prior on the mean."""
with pm.Model() as model:
mu = pm.Normal('mu', **prior)
y = pm.Normal('y', mu=mu, sd=1.5, observed=data) # assume known sd
return model
def train(model):
"""Fits the model, returns a dict of posterior distributions."""
with model:
result = pm.fit(2000, method='advi')
return {
'mu': result.mean.eval()[0],
'sd': result.std.eval()[0],
}
if __name__ == '__main__':
prior = {'mu': 0.0, 'sd': 10.0} # initial prior
for i in range(20):
y = simulate_data()
model = mean_model(y, prior)
posterior = train(model)
prior = posterior
print(i, ':', prior)
My main questions are:
- Is there a better way to do this in pymc?
- If not, what would be everyone’s dream API to make this really easy for complex models?
- Is this statistically correct? (it feels so, since it’s really just applying bayes rule)