Pm.set_data in 4.2.0

jsh980210 · September 24, 2022, 12:47am

In the set data module, https://www.pymc.io/projects/docs/en/stable/api/generated/pymc.set_data.html

it seems that in version 4.2.0, we can only assign new data x to the same size as the old data x in the model, but in 4.1.7 it’s ok to have a different size. I think it’s more reasonable to set a new data to a different size, because if we are doing a training testing split, a common portion is 80% and 20%, so x_train and x_test would have different sizes in most cases.

cluhmann · September 25, 2022, 2:56pm

Can you provide an minimum example to reproduce the problem?

cluhmann · September 25, 2022, 2:59pm

This is what I get with 4.2:

import pymc as pm

with pm.Model() as model:
    y = pm.MutableData('y', [1., 2., 3.])
    beta = pm.Normal('beta', 0, 1)
    obs = pm.Normal('obs', beta, 1, observed=y)
    idata = pm.sample(1000, tune=1000)
    
with model:
    pm.set_data({'y': [1,2,3,4]})
    y_test = pm.sample_posterior_predictive(idata)

y_test.posterior_predictive['obs'].mean(('chain', 'draw'))

#<xarray.DataArray 'obs' (obs_dim_0: 4)>
#array([1.5213813 , 1.50321493, 1.51028904, 1.52245995])
#Coordinates:
#  * obs_dim_0  (obs_dim_0) int64 0 1 2 3

jsh980210 · September 26, 2022, 6:55pm

y = df['indicator']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 42)


    
with pm.Model() as logistic_model_pred:
    beta_0=pm.Uniform('beta_0', -100, 100)
    beta_1=pm.Normal('beta_1', -0.5, 1)
    beta_2=pm.Normal('beta_2', 2, 1)
    first_feature = pm.Data("first_feature", value = X_train['first_feature'], mutable = True)
    second_feature = pm.Data("second_feature", value = X_train['second_feature'], mutable = True) 
    observed = pm.Bernoulli("indicator", pm.math.sigmoid(beta_0 + beta_1 * first_feature + beta_2 * second_feature), observed = y_train)
    step = pm.Metropolis()
    pred_trace = pm.sample(random_seed = [1, 10, 100, 1000], step = step, init = 'auto')
    
with logistic_model_pred:
    pm.set_data({'first_feature': X_test['first_feature']})
    pm.set_data({'second_feature': X_test['second_feature']})
    ppc = pm.sample_posterior_predictive(trace = pred_trace)

y_score = ppc['posterior_predictive']['indicator'].mean(('chain', 'draw'))
print(y_score)

To provide more details, I have two features and one binary target variable, and there are totally 200 observations in the df. I did a 80% training set and 20% testing set. But I got an error like this:

ValueError: size does not match the broadcast shape of the parameters. (160,), (160,), (40,)

My guess was the error was due to the size difference between training and testing set, because in this case my training set has 160 and my testing set has 40 observations.

(I’m not working on my local computer and this is a result of an online environment. Not sure if this info would help.)

cluhmann · September 26, 2022, 7:34pm

If you make the observed data mutable data:

    obs_data = pm.Data("obs_data", value = y_train, mutable = True) 
    observed = pm.Bernoulli("indicator",
                            pm.math.sigmoid(beta_0 +
                                            beta_1 * first_feature +
                                            beta_2 * second_feature
                            ),
                            observed = obs_data
    )

and then swap the observed data:

with logistic_model_pred:
    pm.set_data({'first_feature': X_test['first_feature']})
    pm.set_data({'second_feature': X_test['second_feature']})
    pm.set_data({'second_feature': X_test['second_feature']})
    pm.set_data({'obs_data': y_test})
    ppc = pm.sample_posterior_predictive(trace = pred_trace)

I think it should work?

Topic		Replies	Views
Help with Out of Sample Predictions	12	679	August 24, 2023
Calling `pm.set_data` multiple times v5	4	404	February 2, 2023
Setting new data for predictions, conflicting size with dims version agnostic	3	143	November 8, 2024
Pm.set_data throws error v5 bug	2	430	March 5, 2023
"shape mismatch" when new data is set as a predictor for sample_posterior_predictive v5 prediction	2	2812	November 20, 2022

Pm.set_data in 4.2.0

Related topics