Thanks a lot for the tips!
(2)
That makes sense, it did seem to me like I was over-complicating things trying to update the parameters directly. But just to check, is the first formula just supposed to represent the inference under submodel \omega? I’m not sure why k and \omega get mixed in P^{(\omega)}(l,t_k,\theta|D_k) and what D_{kw} represents. I used \sum_{i} P(D_{ki}|l,t_k) above to mean summing over all datapoints D_{ki} \in D_{k}. Was it supposed to be this instead:
P^{(\omega)}(l,t_\omega,\theta|D_\omega)\propto P(l) P(t_\omega | \theta) P(\theta) P(D_\omega | l, t_\omega)
(1)
I’m not exactly sure what approximating with marginals would entail. What I am interested in overtly calculating is just P(l | D_1, ...,D_k) and P(t | D_1,...,D_k) after each k.
If this is of any use, this is how the likelihood roughly looks like:
\begin{equation}
P((m,u)=D_{ki} | l, t_k) \propto \sum_{c_i \in C(t_k)}^{} \frac{1}{|C(t_k)|} \, L(m \, | \, u, c_i, l)
\end{equation}
\begin{equation}
L(m \, | \, u, c_i, l) = \left\{
\begin{array}{@{}rl@{}}
& g(u,l,c_i) \quad \quad f(m, u, l) > 1 \\
& h(u,l,c_i), \, \, \, \, \, \, f(m, u, l) < 0 \\
\end{array}
\right .
\end{equation}
But I think approximating using marginals might be my only viable solution.