Variational inference over cartesian product of large sets of observations


#1

I have 2 large sets of observations and I would like to do variational inference over the cartesian product of these sets. How do I use pymc3.Minibatch to get representative samples.

For example suppose the observations a are vactors and I want to model the distribution of dot product of of samples from the 2 sets.

Something like:

model = Model()
with model:
  A = pm.Minibatch(a, 100)
  B = pm.Minibatch(b, 100)
  C = pm.Deterministic('C', A.dot(B))
  N = pm.Normal('N, 0, 100, C)
  fit = pm.fit()

except I think do not think the above will sample fairly from the cartesian product of a and b.

How do I do something like the above but sampling uniformly over the cartesian product of a and b?


#2

What do you mean by:

The minibatch sync across different input so you should be fine doing this. Also, since a and b is observed you can compute C first and do the minibatch on C


#3

Thanks! a and b are too large to precompute C particularly since the function I actually need to compute returns high dimensional vectors.

Also do I need to give the 2 Minibatches different seeds? It looks like by default menibatches all initialize to 42. Could that cause problems?


#4

I dont think you would want to set different seed, as I understand that you would want to the minibatch to be in sync


#5

I do not think I want them in sync. I want it possible for any item in a to pair with any item in b with equal probability. If they are in sync then I think most pairings can never happen.

Or am I thinking about this wrong?


#6

Oh I think I get what you mean - since a and b is too large to do dot product, you want to minibatch them to just get a sliceā€¦ I am not sure it is the right way to do: I dont think dot product itself is batchable