Here is a recording of a live stream I presented last week on the topic of Bayesian QLearning, with a code example using PyMC3.
This might be of interest also because of the following:

I show an example of estimating a model using a sparse tensor: I have a multidimensional random variable, but each data point in my observed tensor corresponds to updating only one instance of my random variables.

I show an example of incremental (online) model estimation, where at each new iteration, the posteriors from the previous iteration become the priors on the new iteration.