Hi how’s it going?
I have a project I’m working on where I’m trying to model individual predictions, but they must fall within the bounds of a group total.
Here’s an example:
data = pd.DataFrame({'group':['group_1']*6+['group_2']*6,'user':['bob','phil','anthony']*2+['ben','nick','henry']*2,'var1':np.random.random_sample(12),'var2':np.random.random_sample(12),'y':[np.random.randint(5,20) for val in range(12)]})
Now say I make predictions on new data… I want to add a constraint that the sum of the predictions per group of the new data adds to the sum of the groups of the input data.
Here is my model for regression and predictions…
y = data['y']
with pm.Model() as m_5_1:
a = pm.Normal("a", 10,5)
bA = pm.Normal("bA",10,5)
bB = pm.Normal("bB",10,5)
sigma = pm.Uniform("sigma", 0,4)
mu = pm.Deterministic("mu", a + bA * data['var1']) + bB * data['var2']
result = pm.Normal(
"result",mu=mu, sigma=sigma, observed=y.values
)
trace = pm.sample()
newdata = pd.read_csv('newdata.csv')
number_of_rows_in_newdata = newdata.shape[0]
new_data_0 = xr.DataArray(
newdata['var1'],
dims=["pred_id"]
)
new_data_1 = xr.DataArray(
newdata['var2'],
dims=["pred_id"]
)
pred_mean = (
trace["a"][:number_of_rows_in_newdata] +
trace["bA"][:number_of_rows_in_newdata] * new_data_0 +
trace["bB"][:number_of_rows_in_newdata] * new_data_1
)
predictions = xr.apply_ufunc(lambda mu, sd: rng.normal(mu, sd), pred_mean, trace["sigma"][:number_of_rows_in_newdata])
My question is how can I set some sort of constraint so that the model takes into account the group_1 and group_2 sum and makes sure that the predictions per group add to those numbers (63,74).
I hope this was a clear enough example, if not please let me know.
Thanks for your help.