I will use fitting Y = X\beta as an example.
- Both X and Y are data.
- \beta is the parameter that I am trying to find.
- \mathcal{L}(\beta) is the log-likelihood function of the model
My goal is to find a value of \beta that maximizes \mathcal{L}(\beta).
I can use pymc3 to fit one replica of the model:
- Define one scalar prior R for \beta
- Give the theano operation for the log-likelihood \mathcal{L}(\beta) to NUTS.
- Run NUTS to sample \beta
Alternatively, I can use pymc3 to fit 10 replica of the model.
- The i^{\text{th}} replica is a Y = X\beta_{i} model with the same data X and Y.
- The ten priors R_{i} for the ten replicas have the same distribution.
- All ten replicas have the same log-likelihood function \mathcal{L}(\beta_{i})
- All ten replicas are independent. If I have a super-model that consists of all ten replicas, the log-likelihood function of the super-model is sum of log-likelihoods of all ten replicas, which is \mathcal{S}(\beta_{1}, \ldots, \beta_{10}) = \sum_{i} \mathcal{L}(\beta_{i}). This should be same as using an array of log-likelihoods (NUTS and array of log-likelihoods).
- Give the theano operation for the log-likelihood \mathcal{S}(\beta_{1}, \ldots, \beta_{10}) of the super-model to NUTS.
- Use NUTS to sample all ten \beta_{i} simultaneously.
- Combine the ten posterior distributions for all ten \beta_{i} to form one posterior distribution.
The end result should be same as fitting one replica of the model, except that I get a higher throughput. I plan to combine the matrix multiplication step in all the replicas into one big matrix multiplication step on the gpu.
Ok … NUTS doesn’t know this. The mass matrix would be 10^{2} times larger. Hopefully theano can schedule that on the gpu.