Thanks for making pymc3! I’m a beginner in Probabilistic Programming but I am already very impressed with what I can do with pymc3!
I am having problems getting the Minibatch to work. The docs have broken formatting that makes it very hard to read: https://docs.pymc.io/api/data.html There is a brief tutorial but it doesn’t solve my problem: https://docs.pymc.io/notebooks/variational_api_quickstart.html#Minibatches I have also searched GitHub, StackOverflow and discourse but I’m still not sure how to do this.
Perhaps you could update your docs and tutorial with this kind of case, as it is probably quite common.
Simplified Example:
I have data with X and Y where the distribution of Y depends on X. For example:
# Number of data-points.
n = 1000000
# Random data for X.
X = np.random.uniform(0, 10, size=n)
# Random data for Y which depends on X.
noise = np.random.normal(size=n)
Y = 7.25 + 2.5 * X + 5.3 * noise
# Find the parameters for the relation between X and Y.
with pm.Model() as model:
# Prior parameters.
a = pm.Normal('a', mu=0, sigma=10)
b = pm.Normal('b', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=10)
# Relation between X and the mean of Y.
mu = a + b * X
# Observed output Y.
Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)
# Find the posterior parameters.
approx = pm.fit()
# Sample from the posterior distributions for the parameters.
trace = approx.sample(draws=500)
# Plot the distributions for the posterior parameters.
pm.traceplot(trace)
This runs quite slowly because there are so many data-points. So I want to use mini-batches, but it is unclear to me, how I can use pm.Minibatch
to draw from both X and Y simultaneously?
Thanks!