How to specify mu(mean of outcome) as a function of x(a feature) using the model context

Basically, I want to mimic the functionality of sample_posterior_predictive, but as a function of x (a feature).

The following code is not functional, but it illustrates what I want to achieve. Here, x is a PyTensor variable, and ff is the function that outputs mu_pred (the deterministic mu) as a function of x. Although I could achieve this by explicitly redefining mu_pred as mu_pred = X @ betas outside the model context, it would be much more convenient to directly use the mu that is already defined within the model context.

# This code does not work
x = pt.tensor(shape=(1,))
with model:
    pm.set_data({"X": x[None]})
    mu_pred = pm.sample_posterior_predictive(idata, var_names=["mu"]).posterior_predictive.mu
ff = pytensor.function([x], mu_pred)  # Define the function

Are there any ways to acheive this?

This is possible, but would it not be easiest to just wrap the set_data and sample_posterior_predictive in a small helper function?

def _resample_ppd(new_x, idata, model):
    with model:
        pm.set_data({"X": new_x})
        mu_pred = pm.sample_posterior_predictive(idata, var_names=["mu"]).posterior_predictive.mu
        return mu_pred

from functools import partial
ff = partial(idata=idata, model=model)

Thank you, Jesse! This works well for predicting certain sets of x. However, to clarify further, it would be more interesting to convert this into a PyTensor function (which would be differentiable). This would enable tasks like optimization.

If all you want is a function that returns a draw from the mean you can just do ff = pm.compile_pymc(inputs=inputs, outputs=mu) directly, where inputs is a list of your data variables.

Note, however, that pytensor does not support stochastic gradients by default (see this conversation here) so you would need to rewrite your model using the reparameterization trick, and you would be limited to models where all random variables can be expressed in this fashion (loc-scale family basically).

Also there are plenty of gradient-free optimization methods that work great on certain classes of problems. You might be better off trying one of those first, before embarking on a huge quest to get stochastic gradients.

1 Like

You may want to do something like this workflow: PyMC_optimization.ipynb · GitHub

For the predictive function, you replace the RVs by the posterior draws, then the mean function should be differentiable wrt to optimization inputs (whatever they are)

Yeah, looks like this is what I was looking for. Thanks

Hi @ricardoV94
I went through your notebook, PyMC_optimization.ipynb · GitHub and got the gist of how it works. I just want to clarify one thing: when should we use rewrite_graph with the arguments ("ShapeOpt", "canonicalize", "stabilize", "specialize")? I guess this is part of a standardization routine after every graph rewrite? but

  • When would you recommend using this?
  • How can we identify if an issue arises due to not using it?

Below are the instances of where they are used. Also if you can link me to the other resources where it talks about this you can also direct me to those.


def rewrite(*vars: TensorVariable) -> Sequence[TensorVariable]:
    return rewrite_graph(vars, include=("ShapeOpt", "canonicalize", "stabilize", "specialize"))

# instane 1
def posterior_predictive_fn(
  ....
  # Clean up shape graph (or else blockwise fails)
   predicted_vars = rewrite(*predicted_vars)
  ....
  return batch_predicted_vars


# instane 2
predicted_y, predicted_mu = rewrite(predicted_y, predicted_mu)

I would recommend doing that by default, it will clean up the graph so vectorization behaves more nicely. In cases where you want to predict on a subset of the original model, say if you have a timeseries observation, but for optimization care only about the last time-point (or a smaller range), PyTensor can often generate a more efficient graph after rewrites.

If you note evaluating the predicting function (or gradient) is “too slow”. Otherwise you would have to look at the generated graph and make sense of it. A more black-box approach is to just try with and without and compare evaluation speed with %timeit.

Having said that, I would suggest running by default. It should be generally safe, except the specialize flag which may introduce Ops that are not differentiable. It should never lead to silent incorrect results, that would be a bug!

Ok got it, thanks