Draw_values() speed/scaling with transformed variables

I have a model with a variable defined as a deterministic transformation from another:

import pymc3 as pm
with pm.Model() as model:
    logP = pm.Uniform('logP', 0, 4)
    P = pm.Deterministic('P', 10**logP)

and I would like to generate a large number of samples (>~hundreds of millions) from the prior over P. To do this, I have been using:

from pymc3.distributions import draw_values
samples = draw_values([P], size=1_000_000)

However, I’ve noticed that this is extremely slow, and the performance here scales like the number of samples requested: see a full working example here https://gist.github.com/adrn/fc341f725303760e7bfe98c17f5188b7. This smells like there is a Python loop being executed?

From what I can surmise from profiling, this is the line responsible for the slowness: https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/distribution.py#L850, however I can’t find any explicit loop over samples, but to be fair I really don’t understand what is happening internally to produce the samples in the transformed variable.

Does anyone have any ideas here? Thanks!

@lucianopaz and @rpgoldman is working on it but yes currently draw_value that used in prior predictive is using a for loop.

1 Like

Yes, there is a python loop involved but it’s kind of hidden.
The deterministic, P, is a theano tensor variable. To execute the operation and get a concrete value back, we first compile the expression to a callable function. That’s where _compile_theano_function comes in. In your case, basically you get a function that looks like P=f(logP). A common complication with rv shapes is that theano is very picky with the number of dimensions of the tensor variables. As you’ve written down, P and logP are scalars, which means that they have ndim=0. If you try to pass a vector of logP values as input to f, you’ll get a ValueError.
The way we get draw_values to work under these circumstances is to wrap f with numpy.vectorize using a signature that is made aware of the dimensions of the theano tensors. In the end, numpy is responsible for looping inside of vectorize.

You ask, why do we have to vectorize like this? The answer is because the pymc3 distributions are not pure theano Ops, and random has to work around a lot of theano and numpy interaction which ends up being slower.

How can you improve prior predictive speed? You can make a separate model for prior predictive sampling that gets the RV’s shapes in a way to get the deterministic you are interested in to be computed completely in theano and not relying on numpy.vectorize like this:

import time
import numpy as np
import pymc3 as pm
from matplotlib import pyplot as plt

times1 = []
times2 = []
sizes = 2 ** np.arange(1, 20+1)

with pm.Model() as model1:
    logP = pm.Uniform("logP", 0, 4)
    P = pm.Deterministic("P", 10**logP)


for size in sizes:
    t1 = time.time()
    samples1 = pm.distributions.distribution.draw_values([model1.P], size=size)
    times1.append(time.time() - t1)
    with pm.Model() as model2:
        logP = pm.Uniform("logP", 0, 4, shape=(size,))
        P = pm.Deterministic("P", 10**logP)
    t2 = time.time()
    samples2 = pm.distributions.distribution.draw_values([model2.P])
    times2.append(time.time() - t2)
    assert samples1[0].shape == samples2[0].shape

plt.loglog(sizes, times1, '-ob')
plt.loglog(sizes, times2, '-or')

This gives you the following sampling times as a function of sample sizes:
Figure_1

The constant sampling speed you see for the first red dots is the theano compilation overhead in _compile_theano_function. You only see this overhead in the blue curve’s first point because the compiled function is cached for future use.

2 Likes

If you want to add the pure numpy comparison it looks like this:

Figure_1

2 Likes

This is definitely good to know - thanks very much @lucianopaz!

Unfortunately though this won’t work for my use case. A bit more detail: I want to generate prior samples from a set of parameters passed in by a user, and those parameters could have arbitrary relationships or definitions, so I can’t easily re-define the model with a new shape. Unless there is a way to clone variables to a new model context with new shapes?

@adrn, I haven’t been able to successfully clone an existing model, changing some of the RV shapes.
pymc3 strives to be flexible and allow for a single model to be used either for prior predictive posterior and posterior predictive sampling. That flexibility, along with the design that focused on posterior sampling, lead to the performance penalty you see. As it stands now, your only choice to speed things up is to generate a new model where you manually set all rv shapes to ensure they broadcast correctly, and are able to draw values from them with size=None as to not rely on vectorize. In general, doing that is very hard, and saves you almost no pain as compared to doing it in numpy (or tensorflow probability for that matter).
Will your models only be used to draw samples from the prior?

@adm With help from @lucianopaz, I have a PR to PyMC3 that provides pm.fast_sample_posterior_predictive, which is roughly O(n) for n the number of variables, instead of O(n * m) where m the number of samples. For relatively small models, I find about an order of magnitude speedup. This might be even more substantial as the input size expands; I haven’t measured it carefully. You can find it here: https://github.com/pymc-devs/pymc3/pull/3597

With respect to your sampling question – do you really want to sample from the prior only (and not from the posterior)? If so, PyMC3 may be over-kill. It might be easier to simply write a forward generative model with something like scipy.stats or some other simulation framework.

If you want the posterior predictive, then what I have been doing to handle generating predictive samples from a posterior distribution with different shapes:

  1. Train the model with the training data.
  2. Take the resulting trace and remove variables whose shape is influenced by the predictor (independent) variables and observations. I don’t have an algorithm for doing this; I do it by hand. Typically, theano missing input errors tell me when I have left something in that I shouldn’t have.
  3. Save the trace to a file.
  4. Build a new model object, with the same “internal” variables (these will be parameters and hyper-parameters), but with new predictors.
  5. With this new model we load the filtered trace saved in step 3.
  6. Using this new model object and the filtered trace, generate posterior predictive samples from novel inputs.

Awesome, thanks! I’ll give that PR a try.

Yea, I have a somewhat unusual use-case here. I do actually want to sample from the prior only: We have a custom rejection sampler that is designed to solve a very specific problem in my field (astronomy) that is designed to operate on a large library of prior samples. Rejection sampling because the posterior pdf is often extremely structured and multi-modal, but we still want to get samples out. I want to make it relatively straightforward for users to control the prior distribution using whatever arbitrary (possibly joint) distributions they want. This is what made me think pymc3 would be a good way to enable this: the scipy.stats distributions are great and would suffice if all parameters were independent, but here I actually want to enable specifying non-separable distributions easily as well.

I’m afraid my PR will need some expansion for your use case, then. I’m afraid it only works for posterior predictive sampling. But it should be trivial to extend to sampling from the prior. I don’t know if the modifications @lucianopaz made address this problem.

If you are interested, let me know – you should be able to contact me through GitHub – and I’ll have a look at getting this set up for you.

@rpgoldman, @adrn, I think that the fast sample posterior predictive PRs have the same complexity O(nm). The fast one is faster because part of the m draws are vectorized. But in the end, both pymc3 and numpy will scale as O(nm) because the underlying random number generator isn’t parallelized.

If you are only interested in prior sampling and want to return the samples of a single variable, then you could use pymc4. We don’t have the sample_prior_predictive API in place so you’ll need to work around some low level functions. Im not near a computer so I can’t test this but your use case maybe will look like:

import tensorflow as tf
import pymc4 as pm

@pm.model()
def model():
    yield logP = pm.Uniform("logP", 0, 4)
    return 10**logP

def sample_prior_predictive(sample_shape=()):
    def sampler(*args, **kwargs):
        return pm.evaluate_model(model)[0]

    return tf.vectorized_map(sampler, tf.ones(sample_shape))

samples = sample_prior_predictive(sample_shape=(1000000,))

I’m not really tensorflow savy so there maybe is a more appropriate way to parallelize the sampling. Maybe @junpenglao can comment on this.

1 Like