How to use a TensorVariable in GP predict method

Below is a very basic toy model of what I want to do

import numpy as np
import pymc as pm

x = np.linspace(-10, 10, 100)
noise = np.random.randn(x.shape[0])
y = x * x / 10 + noise

with pm.Model():
    cov_func = gp.cov.Matern32(1, ls=[10])
    gp = pm.gp.Marginal(cov_func=cov_func)
    f = gp.marginal_likelihood('f', X=x.reshape(-1, 1), y=y, sigma=1)

    x_rv = pm.Uniform('x_rv', -10, 10, shape=(1,1)) # Xnew below has to have shape (n, 1) according to docs
    pred = f.predict(Xnew=x_rv, diag=True)
   ...

Ignore what comes after since the code execution hasn’t even made is that far yet, this code spits the error

Cell In[28], line 14
     11 f = gp.marginal_likelihood("f", X=x.reshape(-1, 1), y=y, sigma=1)
     13 x_rv = pm.Uniform('x_rv', -10, 10, shape=(1,1)) # Xnew below has to have shape (n, 1) according to docs
---> 14 pred = gp.predict(Xnew=x_rv)
     16 pm.sample()

File ~/python_venvs/pymc5_venv/lib/python3.10/site-packages/pymc/gp/gp.py:638, in Marginal.predict(self, Xnew, point, diag, pred_noise, given, jitter, model)
    636 mu, cov = self._predict_at(Xnew, diag, pred_noise, given, jitter)
    637 print(mu, cov)
--> 638 return replace_with_values([mu, cov], replacements=point, model=model)

File ~/python_venvs/pymc5_venv/lib/python3.10/site-packages/pymc/gp/util.py:77, in replace_with_values(vars_needed, replacements, model)
     68 fn = compile_pymc(
     69     inputs,
     70     vars_needed,
   (...)
     73     on_unused_input="ignore",
     74 )
     76 # Remove unneeded inputs
---> 77 replacements = {name: val for name, val in replacements.items() if name in input_names}
     78 missing = set(input_names) - set(replacements.keys())
     80 # Error if more inputs are needed

AttributeError: 'NoneType' object has no attribute 'items'

I have tried to circumvent this problem by using x_rv_arr = pm.floatX(x_rv), which as I understand ought to convert a TensorVariable to a ndarray, and passing this as Xnew but it didn’t work. I get the same error.

Hi Ominusliticus, any particular reason you want x_rv to be a pymc random variable? I think a bit more description about your goals would help me understand the problem.

If you just want to draw a new vector of x’s from the uniform distribution and push them through to generate predictions, I think the simplest approach is

with pm.Model() as m0:
    cov_func = pm.gp.cov.Matern32(1, ls=[10])
    gp = pm.gp.Marginal(cov_func=cov_func)
    f = gp.marginal_likelihood('f', X=x.reshape(-1, 1), y=y, sigma=1)

x_rv = np.random.uniform(low=-10,high=10,size=100)
x_new = x_rv.reshape(-1, 1)

with m0:
    pred = gp.predict(Xnew=x_new, diag=True)

If you really need to avoid stepping out the model context to generate your new samples, you could try this. It’s a bit awkward and I feel like it won’t be very fast but it works.

with pm.Model() as m0:
    cov_func = pm.gp.cov.Matern32(1, ls=[10])
    gp = pm.gp.Marginal(cov_func=cov_func)
    f = gp.marginal_likelihood('f', X=x.reshape(-1, 1), y=y, sigma=1)

    x_rv = pm.Uniform('x_rv', -10, 10) 
    x_new = pm.draw(x_rv,draws=100)[:,None]
    pred = gp.predict(Xnew=x_new, diag=True)

pm.draw() grabs samples from the uniform distribution, which seems to be what you’d want to pass over to predict.

1 Like

Hi daniel-saunders-phil,

I can give more context, and chances are I will be posting more questions on the project I am currently battling. I will try to summarize the big picture as best as I can:

  • I have data that describes some physical phenomena with output \mathbf Y and that depends on some inputs \mathbf x. The idea is to determine what \mathbf x_\text{true} (and a credible interval) would best reproduce the data.
  • I have K theories/models that have been developed to describe the phenomena: \{\mathcal M_i\}_{i=1,\ldots,K}.
  • I want to use a pm.Mixture distribution to combine these models in a likelihood, which I would use to figure out posterior distribution p(\mathbf x | \mathbf Y) for \mathbf x.
  • These models are cost-intensive to evaluate, so I want to use emulators (gaussian processes) to model them. So, ideally, the emulator consumes a sample \mathbf x_\text{sample} and spits our a prediction \mathbf Y_{\mathcal M_i} for each theory/model.

Therefore, being able to sample the uniform distribution in important. Could I adopt you example and register x_new using pm.Deterministic to record its chain?

1 Like

It seems like you are in case 3 here? I believe az.compare gives back a set of model model weights that would result in optimal PSIS-LOO, given the data (@OriolAbril am I interpreting that right?). That might be a more feisible approach than trying to endogenously model the selection between the theories/models (which might not be identified anyway, if they are sufficiently similar)

2 Likes

Okay that helps. Jesse’s suggestion is a good one. When comparing models in this way, it seems like we have three options: a big mixture model over all them, using information criteria like PSIS-LOO, or a bayes factor. A big mixture can be pretty tough for samplers and you might find it a lot easier to just build a K separate models and compare them.

Could I adopt you example and register x_new using pm.Deterministic to record its chain?

for sure.

If you still want to go the mixture route, I think the other big question is what, other than the mixture distribution, do you want to learn during inference? In the toy example, you might also want to learn:

  • the length scale
  • the values of x_rv

If you want to learn those, then just drawing from the uniform distribution wouldn’t help much. Instead you’d want something like this:

with pm.Model() as m0:
    cov_func = pm.gp.cov.Matern32(1, ls=[10])
    gp = pm.gp.Latent(cov_func=cov_func)
    x_rv = pm.Uniform('x_rv', -10, 10,shape=(100,1))
    f = gp.prior('f', X=x_rv)

Then you could treat f as the prediction you were previously grabbing from .predict and pass it further down the model. The predict method is an awkward fit for your task because the intention is that people will train their GP on one dataset first and then use predict for out-of-sample inference. But if everything needs to fit into one big model, I think this is a better route.

2 Likes

Thank you Jesse and Daniel (you can call me Kevin), for the help so far.

Before switching to PyMC, I had been using ptemcee where I did something the similar. Here the emulators were trained (using sklearn’s GaussianProcessRegressor) on training data that was independent from the inference data. This is one flaw in the code I shared above, the training data for the GP should not be the same as the inference data. The point of the emulators is to facilitate cheap model evaluations.

There is another nuance, I want the shape parameters of the Dirichlet distribution to have dependence on another variable \theta which can be measured in an experiment. In addition to learning the value \mathbf x_\text{true}, I want to learn the weights’ dependence on some measurable parameter \theta (which need not appear in the vector \mathbf x) in the experiment.

So code might look something like this

K = ... # Number of models
N = ... # Number of points for theta
 
emulator_training_x = ...
emulator_training_y = ...
observation_y = ... # has shape (N,)

with pm.Model() as m0:
    cov_func = pm.gp.cov.Matern32(1, ls=[10])
    emulators = []
    for k in range(K):
        gp = pm.gp.Marginal(cov_func=cov_func)
        emulators.append(
            gp.marginal_likelihood(
                f'model_emulator_{k}',
                X=emulator_training_x,
                y=emulator_training_y,
                sigma=0
            )
        )

    x = pm.Uniform(...)
    for n in range(N):
        comp_dist = [
            pm.Normal.dist(
                *emulator[k].predict(Xnew=x, diag=True),
                observed=observation_y[n]
            )
            for k in range(K)
        ]
        shape_parameters = pm.LogNormal(
            f'shape_parameters_{n}',
            mu=0,
            sigma=1,
            shape=K
        )
        weights = pm.Dirichlet(f'weight_{n}', a=shape_parameters)
        pm.Mixture(f'mixture_{n}', comp_dist=comp_dist, w=weights)
    
    idata = pm.sample(1_000_000, ...)

I guess the easiest way to describe what I need is a method that converts a sample RandomVariable to a numpy array. I can imagine training emulator with a difference library like GPy and I would still want the setup to work.

For those that find there way here. A solution that I ended up using is to construct RVs in my model using the gp.Marginal.conditional function and specified the shape of the x_rv RV.

In the end, it looks something like this:

inference_parametes = [...]  # List of strings
with pm.Model() as inference_model:
        inference_vars = pm.Uniform(
            'inference_vars',
            lower=inference_parameters_ranges[:, 0],
            upper=inference_parameters_ranges[:, 1],
            shape=(len(inference_parameters), 1)
        )

    for i, tau in enumerate(observation_times):
        comp_dists = [
            pm.MvNormal.dist(
                mu=[
                    pm.gp.Marginal(cov_func=cov_func).conditional(
                       name=f'{name}_{observable}_{i}',
                       Xnew=inference_vars,
                       given={
                           'X': emulator_design_points,
                           'y': emulator_training_data[name][i, :, j + 1],
                           'sigma': 0
                       },
                       shape=(len(inference_parameters),)
                    )
                    for j, observable in enumerate([...])
                ],
                cov=np.diag(observation_error[i]),
            )
            for name in hydro_names
        ]
          
        alpha = pm.Lognormal(
            f'alpha_{i}',
            mu=0.0,
            sigma=1.0,
            shape=len(hydro_names)
        )
        weights = pm.Dirichlet(f'Dirichlet_{i}', a=alpha)
        pm.Mixture(
            f'mix_{i}',
            w=weights,
            comp_dists=comp_dists,
            observed=observation_data[i, 1:].reshape(-1, 1),
        )

The problem with this is that it take forever to build for the number of observation times larger than 4. I will open another question regarding passing of observation data.