How to make a TensorFlow ML model compatible with PyMC3?

The first issue you would run into is that it isn’t clear to me how you can get the gradient term \frac{\partial F}{\partial \theta} from the sci-kit learn interface, and particularly for RandomForestRegressor. I think this is probably a herculean task and you would have to write a bunch of code for this yourself. If you were just doing something like a multivariate linear regression then this wouldn’t be too hard since you could just access the coefficient/intercept matrices and pass them through an analytic formula for the model gradient.

The second issue is more fundamental in that I don’t think you are playing to the strengths of PyMC (and Bayesian analysis in general) by doing this. By training a neural network you’re estimating a single set of parameters \theta^* for a model that takes you from [AGE, RM, DIS] to [PRICE, TAX], so a mapping F_{\theta^*}: \{ \text{all ages} \} \times \{ \text{all rm} \} \times \{ \text{all dis} \} \to \{ \text{all prices} \} \times \{ \text{all taxes} \}. You’re then trying to gauge the uncertainty (given \theta^*) in [AGE, RM, DIS] given an observation of [PRICE, TAX]. IMO the “Bayesian” way of doing this would be to learn estimating a mapping in the reverse direction and with uncertainties in \theta, i.e. the mapping F_{\theta}: \{ \text{all prices} \} \times \{ \text{all taxes} \} \to \{ \text{all ages} \} \times \{ \text{all rm} \} \times \{ \text{all dis} \} ), and then given some [PRICE, TAX] you would automatically get the uncertainties on [AGE, RM, DIS]. But these uncertainties would include the parametric uncertanties coming from your knowledge of \theta, in addition to noisiness from observations.

This basically comes down to the difference between estimating a posterior on your machine learning model parameters \theta versus fixing \theta^* and asking for the posterior on inputs given some outputs. You certainly can do the later, but in this case it isn’t clear to me how you could get the model gradients from sci-kit learn.

As for the former, you have to options. First, you could use a much simpler model (linear regression or some other model you could code in Theano) to estimate the inverse mapping, sample the posterior, then get the uncertainties to free. You wouldn’t have to worry about gradients or a custom Theano Op. Your second option gets complicated very quickly, which is to use a Bayesian neural network. Here you would be training a neural network for the inverse mapping that learns distributions on its weights and biases rather than single values (the normal way). Then, when you query the neural network for the output given some input [PRICE, TAX], you get back a probability distribution over the vector [AGE, RM, DIS] which seems to be what you’re after in the first place.

I initially thought you were trying to train a fixed parameter mapping F_{\eta*} : \Theta \to \mathcal{Y} from a parameter space \Theta to a feature space \mathcal{Y} but you are actually training a fixed mapping F_{\eta^*}: \mathcal{X} \to \mathcal{Y} between two feature spaces and then trying to treat an underlying x \in \mathcal{X} as your “parameters”. The first approach will work for things like surrogate models (I was learning solutions of parametric ODEs), but unfortunately not for your case.

1 Like