(1) Your ability to marginalize depends on the structure of P(D|l, t). The general idea would be to replace l by a Dirichlet, introduce the parameter \alpha to specify its shape, and use a surrogate \tilde P(D_k|t_k, \alpha) = \int P(D_k | l, t_k)\mathrm{Dir}(l,\alpha)dl. Importantly this integral would need a closed-form expression. However given the dimensionality of \alpha I would expect the posterior sampling to be fiddly at best. Do you need a full joint distribution, or can you approximate with marginals like P(D|l,t_k) \approx \prod_j P(D[:, j] | l_j, t_k) (you’d need to re-normalize the posterior means of l_j).
(2)
Instead of sequentially updating, why not sequentially add data. You have
P^{(\omega)}(l, t_k, \theta | D_k) \propto P(l)P(t_\omega|\theta)P(\theta)\sum P(D_{k\omega}|l,t_\omega)
Why not define a sequence of likelihoods
Q^{k}(l, t_1, \dots, t_k, \theta | D_1, \dots, D_k) = \prod_{\omega=1}^k P^{(\omega)}(l, t_\omega, \theta | D_\omega)
The posterior of \theta of Q^k would be the exact posterior of the “sequential update” of the first k submodels.