TensorFlow backend for PyMC4

twiecki · October 9, 2017, 2:44pm

Should we use TensorFlow for PyMC4? What would be the issues of doing so?

Pros are that some successful approaches already exist in the forms of (1) Edward (http://edwardlib.org/) which will be moved into TensorFlow.contrib which might ensure that TensorFlow will have more capabilities to support Probabilistic Programming, and (2) ZhuSuan https://github.com/thu-ml/zhusuan.

junpenglao · October 9, 2017, 4:09pm

Discussion elsewhere, input welcomes
Tensorflow(Backed by Google)

Pros:
- Strong support, large community
- Tensorboard for visualization
- parallel within graph execution that makes things fast
- Convenient deployment
Cons:
- Slow?
- Too many similar package already (e.g., Edward, ZhuSuan, Greta in R)

Other remarks:
Compiler for graphs in development: XLA. It is supposed to support memory optimisations and Op fusion when it’s done.
Custom Ops can be written in python, but C++ ops require us to play games with that strange build system (written in java, and probably not going to end up in any linux packages)

thjashin · October 11, 2017, 7:51am

To my knowledge of the PyMC3 code, an important issue is that TF doesn’t support a theano.clone() equivalent officially. This will make it hard to reuse model graphs. May need to come up with an inference scheme that doesn’t rely on graph copying.

jolespin · October 17, 2017, 4:50pm

I agree with the Pros of TensorFlow and I see the concern of similar packages. Just curious, has there been any conversation about a potential merge with Edward or ZhuSuan? I believe NumPy started out in a similar way. . . I prefer PyMC over the others by far. I also think there are more resources and PyMC than Edward or ZhuSuan; that I have seen at least. I think if TensorFlow was in the backend it would open up a lot of the community to probabilistic programming which would mean more resources and knowledge flowing around.

semanticbeeng · October 27, 2017, 8:01am

Does this answer the question?

junpenglao · January 20, 2018, 1:48pm

A few updates from Tensorflow and the Edward/Bayesflow team:

Distribution class in Tensorflow, it is very well thought out and has a nice design:
tensorflow.contrib.distributions (see also their recent paper).
Edward 2.0 is in the pipeline https://github.com/blei-lab/edward/pull/825
As I understand, this PR allows you to do copying of nodes that supports control flows https://github.com/blei-lab/edward/pull/824

sharan · February 23, 2018, 11:28am

Hey,
@twiecki @fonnesbeck @junpenglao
Just came across this:

junpenglao · February 23, 2018, 12:32pm

I saw it yesterday as well, maybe that’s where contrib.distribution and edward 2.0 would finally be?

junpenglao · March 8, 2018, 4:29pm

I can more or less see how MCMC is implemented in tensorflow (currently how it is done in tensorflow/probability and bayesflow):

a callable logp function written in tensorflow
a kernel that take current state and propose a new state, using a proposal function and the logp for MH accept
wrap the kernel into a tf.while_loop, how ever the while_loop only output the final state, so to get the whole MCMC trace:
wrap the while_loop in 3 with a tf.scan, which output each step of the while_loop

I did some experiment: https://gist.github.com/junpenglao/05b0ed11219df25ff56ddc452fc1f6af. But I didnt use the tf.scan there. There are quite a lot of changes happening currently in tensorflow/probability and tensorflow.contrib.bayesflow, so I would probably wait a bit before doing more experiment.

So for PyMC4 to work using tensorflow, we need:

logp in tensorflow, relatively easy with tensorflow distribution and the pm.Model context available
kernels, the step_methods need to rewrite but also not very difficult
(2.5) compound step, making different kernels into one could be tricky, maybe compound step would becomes a big function that took different kernel as input?
(and 4) rewrite the pm.sampling with tf.while_loop and tf.scan. Doable but there will be lots of painful debugging of the graph.

dustinvtran · March 15, 2018, 6:57pm

Update on the TensorFlow end:

TF Probability is in early stages. But we plan to launch in a few weeks(!).
Edward2 is fairly low-level. One future is that PyMC4 is as a higher-level language on top, where PyMC4’s major value-adds are more automated fitting, non-TF prereqs for model-building, visualization, and many more. I suspect this also fits your audience.
Edward2’s core should be in TF Probability in coming days.

Given you all are potential developers (not just users) of TF Probability, we’d love to set up a meeting so we can share more details. (No need for all core devs to attend this meeting if scheduling times are difficult; we can iterate fast and hopefully stay in regular contact.)

twiecki · March 15, 2018, 7:17pm

Thanks for reaching out @dustinvtran, sounds like good progress is made. We’d be more than excited to chat more about basing PyMC4 on TF Probability / Edward. I’ll drop you an email for scheduling.

ericmjl · March 15, 2018, 7:27pm

@junpenglao you’re so prolific with the testing! Thank you for doing this .

dustinvtran · March 16, 2018, 6:57pm

Edward2 has just been pushed.

https://github.com/tensorflow/probability/tree/master/tensorflow_probability/python/edward2

The commit message:

This initial commit for Edward2 adds three submodules. Together they comprise a minimal software for building flexible probabilistic programs alongside their flexible computation during training and testing. They are:

interceptor, which is a low-level tool for intercepting execution of programs;

random_variable, which provides a primitive for building probabilistic programs;

generated_random_variables, which wraps TensorFlow Distributions as an easy way to write specific random variables.

ericmjl · March 19, 2018, 3:22pm

Thanks for the update, @dustinvtran !

dustinvtran · March 22, 2018, 5:32pm

See the latest commit which adds make_log_joint_fn. This function should be powerful enough to enable most-if-not-all algorithms in PyMC4 given an Edward program as a low-level representation of PyMC4 models.

bicepjai · February 15, 2019, 11:51pm

how can one help with pymc4 development ?

twiecki · February 16, 2019, 3:01pm

https://github.com/pymc-devs/pymc4/issues

Topic		Replies	Views
Alternative Computation Backends for PyMC PyMC4	5	2292	June 7, 2018
Tensorflow backend Development	2	983	September 19, 2017
MxNet backend for PyMC4 PyMC4	2	2075	November 17, 2017
Theano, TensorFlow and the Future of PyMC Development	5	1687	May 23, 2018
PyMC3 Support/Maintenance Timeline Development	1	735	April 25, 2018

TensorFlow backend for PyMC4

Related topics