Should we use TensorFlow for PyMC4? What would be the issues of doing so?
Pros are that some successful approaches already exist in the forms of (1) Edward (http://edwardlib.org/) which will be moved into TensorFlow.contrib which might ensure that TensorFlow will have more capabilities to support Probabilistic Programming, and (2) ZhuSuan https://github.com/thu-ml/zhusuan.
Discussion elsewhere, input welcomes
Tensorflow(Backed by Google)
Pros:
Strong support, large community
Tensorboard for visualization
parallel within graph execution that makes things fast
Convenient deployment
Cons:
Slow?
Too many similar package already (e.g., Edward, ZhuSuan, Greta in R)
Other remarks:
Compiler for graphs in development: XLA. It is supposed to support memory optimisations and Op fusion when it’s done.
Custom Ops can be written in python, but C++ ops require us to play games with that strange build system (written in java, and probably not going to end up in any linux packages)
To my knowledge of the PyMC3 code, an important issue is that TF doesn’t support a theano.clone() equivalent officially. This will make it hard to reuse model graphs. May need to come up with an inference scheme that doesn’t rely on graph copying.
I agree with the Pros of TensorFlow and I see the concern of similar packages. Just curious, has there been any conversation about a potential merge with Edward or ZhuSuan? I believe NumPy started out in a similar way. . . I prefer PyMC over the others by far. I also think there are more resources and PyMC than Edward or ZhuSuan; that I have seen at least. I think if TensorFlow was in the backend it would open up a lot of the community to probabilistic programming which would mean more resources and knowledge flowing around.
I can more or less see how MCMC is implemented in tensorflow (currently how it is done in tensorflow/probability and bayesflow):
a callable logp function written in tensorflow
a kernel that take current state and propose a new state, using a proposal function and the logp for MH accept
wrap the kernel into a tf.while_loop, how ever the while_loop only output the final state, so to get the whole MCMC trace:
wrap the while_loop in 3 with a tf.scan, which output each step of the while_loop
I did some experiment: https://gist.github.com/junpenglao/05b0ed11219df25ff56ddc452fc1f6af. But I didnt use the tf.scan there. There are quite a lot of changes happening currently in tensorflow/probability and tensorflow.contrib.bayesflow, so I would probably wait a bit before doing more experiment.
So for PyMC4 to work using tensorflow, we need:
logp in tensorflow, relatively easy with tensorflow distribution and the pm.Model context available
kernels, the step_methods need to rewrite but also not very difficult
(2.5) compound step, making different kernels into one could be tricky, maybe compound step would becomes a big function that took different kernel as input?
(and 4) rewrite the pm.sampling with tf.while_loop and tf.scan. Doable but there will be lots of painful debugging of the graph.
TF Probability is in early stages. But we plan to launch in a few weeks(!).
Edward2 is fairly low-level. One future is that PyMC4 is as a higher-level language on top, where PyMC4’s major value-adds are more automated fitting, non-TF prereqs for model-building, visualization, and many more. I suspect this also fits your audience.
Edward2’s core should be in TF Probability in coming days.
Given you all are potential developers (not just users) of TF Probability, we’d love to set up a meeting so we can share more details. (No need for all core devs to attend this meeting if scheduling times are difficult; we can iterate fast and hopefully stay in regular contact.)
Thanks for reaching out @dustinvtran, sounds like good progress is made. We’d be more than excited to chat more about basing PyMC4 on TF Probability / Edward. I’ll drop you an email for scheduling.
This initial commit for Edward2 adds three submodules. Together they comprise a minimal software for building flexible probabilistic programs alongside their flexible computation during training and testing. They are:
interceptor, which is a low-level tool for intercepting execution of programs;
random_variable, which provides a primitive for building probabilistic programs;
generated_random_variables, which wraps TensorFlow Distributions as an easy way to write specific random variables.
See the latest commit which adds make_log_joint_fn. This function should be powerful enough to enable most-if-not-all algorithms in PyMC4 given an Edward program as a low-level representation of PyMC4 models.