Hi all,
I’m very new to the PyMC3 community and also unexperienced with Bayesian modeling, so sorry in advance for any blunders in this post. I’m trying to implement a hierarchical Bayesian model and am struggling to figure out how to calculate the marginal probability of some discrete or continuous parameters in order to incrementally update them.
More specifically, this is how I’m doing the inference:
\begin{equation} P(l, t_{k}, \theta \, | \, D_{k}) \propto P(l) \, P(t_{k} \, | \, \theta) \, P(\theta) \, \sum_{i=1}^{len(D_{k})} P(D_{ki} \, | \, l, t_{k}) \end{equation}
where:
- D_{k} is the set of data associated with submodel k
- \theta is a continuous variable, the over-hypothesis about the distribution of t_{k} across multiple submodels
- l is a discrete variable with about 1k possible values
- t_{k} is a discrete variable with 4 possible values
And the following distributions:
\theta \sim Dirichlet(a), \, \, \, len(a)=3, \, a \, fixed
t_{k} \sim Categorical(\theta)
l \sim Categorical(p), \, \, \, len(p)=1024, \,p \, fixed
I have no problem calculating my posterior P(l, t_{1}, \theta \, | \, D_{1}), but after fitting a small number of D_{1} observations (about 100) I need to be able to update P(l) and P(\theta) in some way, so I can then replace them in the inference above and fit a new set of observations D_{2} for submodel 2. This needs to be repeated for about 10 submodels, and I’m particularly interested in seeing how the marginal posteriors over l and t change after fitting each submodel.
Generally, after fitting N submodels, I should be able to estimate these two distributions:
P(l \, | \, D) = \sum_{t} \int_{\theta} P(l, t, \theta \, | \, D) \, d\theta
P(\theta \, | \, D) = \sum_{l, t} P(l, t, \theta \, | \, D)
D = \cup_{k=1}^{N} D_{k}
t = t_{1} \times t_{2} \times ... \times t_{N}
However, I’m not really sure how to approach these two things:
-
My first issue would be being able to sample from these two distributions in the first place. I read the post here, but I’m not sure how the solution would be best adapted to my situation, since some of my prior parameters (t and \theta) are not conditionally independent. I also have significantly less data to fit.
-
Using this approach, even if I were able to sample from the two distributions I would still need to estimate them so I could incorporate information from every submodel into the global model of \theta. However, I’m not sure KDE would provide sufficiently accurate estimations. Since each set of data D_{k} is no larger than 100 points, I would prefer refitting all previous observations for every new submodel k and avoid any type of incremental updates completely, but I am not sure if that is possible.
Thank you very much for reading this! Any help or tips would be extremely appreciated.