Calling `pm.set_data` multiple times

schwarls37 · February 1, 2023, 10:52pm

In the predicting on hold out data section (4.1) in the quickstart, how come I can’t change the data again, like this (if I’ve already changed the data once as in the tutorial with "x_obs": [-1, 0, 1.0] and coords={"idx": [1001, 1002, 1003]})?

with model:
    # change the value and shape of the data
    pm.set_data(
        {
            "x_obs": [.0, 0, .0],
            # use dummy values with the same shape:
            "y_obs": [.0, 0, .0]
        },
        coords={"idx": [1004, 1005, 1006]},
    )
    idata.extend(pm.sample_posterior_predictive(idata))
idata.posterior_predictive["obs"].mean(dim=["draw", "chain"])

It just returns the same thing as the original code and hasn’t updated the out of sample data at all.

cluhmann · February 2, 2023, 12:07am

Are you creating the data via pm.MutableData()?

schwarls37 · February 2, 2023, 4:55pm

Yes, I believe so. When the following code is run the second attempt to adjust the mutable data has no effect.

x = rng.standard_normal(100)
y = x > 0

coords = {"idx": np.arange(100)}
with pm.Model() as model:
    # create shared variables that can be changed later on
    x_obs = pm.MutableData("x_obs", x, dims="idx")
    y_obs = pm.MutableData("y_obs", y, dims="idx")

    coeff = pm.Normal("x", mu=0, sigma=1)
    logistic = pm.math.sigmoid(coeff * x_obs)
    pm.Bernoulli("obs", p=logistic, observed=y_obs, dims="idx")
    idata = pm.sample()

with model:
    # change the value and shape of the data
    pm.set_data(
        {
            "x_obs": [-1, 0, 1.0],
            # use dummy values with the same shape:
            "y_obs": [0, 0, 0],
        },
        coords={"idx": [1001, 1002, 1003]},
    )

    idata.extend(pm.sample_posterior_predictive(idata))

print(idata.posterior_predictive["obs"].mean(dim=["draw", "chain"]))

with model:
    # change the value and shape of the data
    pm.set_data(
        {
            "x_obs": [.0, 0, .0],
            # use dummy values with the same shape:
            "y_obs": [.0, 0, .0]
        },
        coords={"idx": [1004, 1005, 1006]},
    )
    idata.extend(pm.sample_posterior_predictive(idata))
idata.posterior_predictive["obs"].mean(dim=["draw", "chain"])

cluhmann · February 2, 2023, 5:20pm

I’m not sure what happens when you try to extend an inferenceData objects with multiple groups, each of which shares a name (e.g., posterior_predictive).

This seems to work for me?

import pymc as pm
import numpy as np

rng = np.random.default_rng(12345)

x = rng.standard_normal(100)
y = x > 0

coords = {"idx": np.arange(100)}
with pm.Model() as model:
    # create shared variables that can be changed later on
    x_obs = pm.MutableData("x_obs", x, dims="idx")
    y_obs = pm.MutableData("y_obs", y, dims="idx")

    coeff = pm.Normal("x", mu=0, sigma=1)
    logistic = pm.math.sigmoid(coeff * x_obs)
    pm.Bernoulli("obs", p=logistic, observed=y_obs, dims="idx")
    idata = pm.sample()

with model:
    # change the value and shape of the data
    pm.set_data(
        {
            "x_obs": [-1, 0, 1.0],
            # use dummy values with the same shape:
            "y_obs": [0, 0, 0],
        },
        coords={"idx": [1001, 1002, 1003]},
    )

    pp1 = pm.sample_posterior_predictive(idata)


with model:
    # change the value and shape of the data
    pm.set_data(
        {
            "x_obs": [.0, 0, .0],
            # use dummy values with the same shape:
            "y_obs": [.0, 0, .0]
        },
        coords={"idx": [1004, 1005, 1006]},
    )
    pp2 = pm.sample_posterior_predictive(idata)

print(pp1.posterior_predictive["obs"].mean(dim=["draw", "chain"]))

#<xarray.DataArray 'obs' (idx: 3)>
#array([0.027 , 0.5025, 0.977 ])
#Coordinates:
#  * idx      (idx) int64 1001 1002 1003

print(pp2.posterior_predictive["obs"].mean(dim=["draw", "chain"]))

#<xarray.DataArray 'obs' (idx: 3)>
#array([0.51625, 0.5025 , 0.501  ])
#Coordinates:
#  * idx      (idx) int64 1004 1005 1006

OriolAbril · February 2, 2023, 7:24pm

The issue is (was) with the usage of extend, it is not a merge or concatenation function.

As explained in its docstring, extend has by default join="left" which means that groups that are both present in idata and in other are kept from idata without modifying it, join="right" would replace the repeated groups in idata in order to keep the ones from other.

Topic		Replies	Views
Pm.set_data throws error v5 bug	2	433	March 5, 2023
Help with Out of Sample Predictions	12	697	August 24, 2023
Pm.set_data in 4.2.0 v5 modeling	4	641	September 26, 2022
Prediction with new data brings shape error v5	6	564	January 2, 2024
Using pm.Data to predict on two inputs for sample_posterior_predictive; why is there no change in the results? Questions	4	1157	May 17, 2021

Calling `pm.set_data` multiple times

Related topics