Use distributions for new input in Data containers


I am considering the following use case: I want to learn a function y=f_\theta(x) on historic data (model parameters \theta, features x and target y). For a scenario analysis, I have created a new set of features x_{\text{new}} as distribution (so it is a continuous version of best-case vs. worst-case scenario). Therefore, I would like to input new values x_{\text{new}} as a distribution and then predict values f_\theta(x_{\text{new}}) together with confidence intervals, including the uncertainty both in the model parameters and in x_{\text{new}}.

If x_{\text{new}} would be a normal array, I would use a Data container in the model and then change the data for a new prediction. Is there a best-practice of doing this with distributions? I did not find material on this, so any hint in the right direction is highly appreciated.

This code at the end of this comment is probably pretty close to what you already have, but just to be sure I’ve included it here as a reference for anyone else who wants to do this.

If x_{new} would be a normal array, I would use a Data container in the model

Our assumptions about x_{new} shouldn’t play into whether or not we use the Data container, because we would just be using actual draws of x_{new} to feed into the model to predict. Regardless of how you choose the distribution over x, it seems like you should be able to sample from that distribution and stick the draws into pm.Data.

If you really do want to train on draws of x and predict using a distribution over x without using draws in pm.Data, maybe it would be worth pulling out the parameter estimates from the trace by hand and making a separate posterior predictive model, forcing its parameter values to be the estimates from the first step. I’m unsure as to whether or not there’s a cleaner way to do it.

import numpy as np
import pymc3 as pm

# We assume that we have a linear regression with no intercept
# and a slope of 2. These 4 lines generate simulated data.
x    = np.random.randn(10)
beta = 2.0
eps  = np.random.randn(10) * 0.4
y    = beta*x + eps

# For posterior predictive sampling, we assume
# we won't have actual feature values but instead
# <n_noisy> draws around their mean
n_noisy    = 5
x_new_mean = np.random.randn(10)
x_new      = np.random.randn(n_noisy,10) * 0.2 + x_new_mean

# We fit a standard model using pm.Data to indicate that 
# we will be swapping data in and out later.
with pm.Model() as model:
    feature   = pm.Data('feature', x)
    coef      = pm.Normal('coef', sd=2)
    err_sd    = pm.HalfCauchy('err_sd', beta=0.5)
    response  = pm.Normal('response', mu=feature*coef, sigma=err_sd, observed=y)
    trace = pm.sample()

# Next, we manually draw a new feature vector, tell PyMC3 to replace the old one in the
# graph, and draw the samples. This should be fast because of the way that the posterior
# predictive sampling function is generated + cached.
n_draws    = 50
pp_samples = []

# Here I assume sampling at random from the noisy predictors
for i in range(n_draws):
    idx = np.random.choice(range(n_noisy))
    with model:
        pp_samples.append(pm.sample_posterior_predictive(trace=trace, samples=10, progressbar=False))

# Finally, we extract all of the individual response samples and stick them into one big array
response_samples = np.concatenate([d['response'] for d in pp_samples], axis=0)