Hi how’s it going?
I wanted to do multivariate regression manually instead of GLM, and I just want to make sure this is the correct implementation, in the most basic sense, choice of priors and numbers disregarded.
with pm.Model() as m_5_1:
a = pm.Normal("a", 10,5)
bA = pm.Normal("bA",10,5)
bB = pm.Normal("bB",10,5)
sigma = pm.Uniform("sigma", 0,4)
mu = pm.Deterministic("mu", a + bA * x['var1']) + bB * x['var2']
result = pm.Normal(
"result",mu=mu, sigma=sigma, observed=y.values
)
trace = pm.sample()
and then for predictions on new data I’m doing…
newdata = pd.read_csv('newdata.csv')
number_of_rows_in_newdata = newdata.shape[0]
new_data_0 = xr.DataArray(
newdata['var1'],
dims=["pred_id"]
)
new_data_1 = xr.DataArray(
newdata['var2'],
dims=["pred_id"]
)
pred_mean = (
trace["a"][:number_of_rows_in_newdata] +
trace["bA"][:number_of_rows_in_newdata] * new_data_0 +
trace["bB"][:number_of_rows_in_newdata] * new_data_1
)
predictions = xr.apply_ufunc(lambda mu, sd: rng.normal(mu, sd), pred_mean, trace["sigma"][:number_of_rows_in_newdata])
Is there anything I’m doing wrong here or is this the correct implementation on multivariate regression and subsequent out of sample predictions?
Thanks!