What is pm.Potential?

I will use fitting Y = X\beta as an example.

  • Both X and Y are data.
  • \beta is the parameter that I am trying to find.
  • \mathcal{L}(\beta) is the log-likelihood function of the model

My goal is to find a value of \beta that maximizes \mathcal{L}(\beta).


I can use pymc3 to fit one replica of the model:

  • Define one scalar prior R for \beta
  • Give the theano operation for the log-likelihood \mathcal{L}(\beta) to NUTS.
  • Run NUTS to sample \beta

Alternatively, I can use pymc3 to fit 10 replica of the model.

  • The i^{\text{th}} replica is a Y = X\beta_{i} model with the same data X and Y.
  • The ten priors R_{i} for the ten replicas have the same distribution.
  • All ten replicas have the same log-likelihood function \mathcal{L}(\beta_{i})
  • All ten replicas are independent. If I have a super-model that consists of all ten replicas, the log-likelihood function of the super-model is sum of log-likelihoods of all ten replicas, which is \mathcal{S}(\beta_{1}, \ldots, \beta_{10}) = \sum_{i} \mathcal{L}(\beta_{i}). This should be same as using an array of log-likelihoods (NUTS and array of log-likelihoods).
  • Give the theano operation for the log-likelihood \mathcal{S}(\beta_{1}, \ldots, \beta_{10}) of the super-model to NUTS.
  • Use NUTS to sample all ten \beta_{i} simultaneously.
  • Combine the ten posterior distributions for all ten \beta_{i} to form one posterior distribution.

The end result should be same as fitting one replica of the model, except that I get a higher throughput. I plan to combine the matrix multiplication step in all the replicas into one big matrix multiplication step on the gpu.

Ok … NUTS doesn’t know this. The mass matrix would be 10^{2} times larger. Hopefully theano can schedule that on the gpu.