Is this the correct way to do multivariate regression without using GLM?

Hi how’s it going?

I wanted to do multivariate regression manually instead of GLM, and I just want to make sure this is the correct implementation, in the most basic sense, choice of priors and numbers disregarded.

with pm.Model() as m_5_1:
    a = pm.Normal("a", 10,5)
    bA = pm.Normal("bA",10,5)
    bB = pm.Normal("bB",10,5)
    sigma = pm.Uniform("sigma", 0,4)
    mu = pm.Deterministic("mu", a + bA * x['var1']) + bB * x['var2']

    result = pm.Normal(
        "result",mu=mu, sigma=sigma, observed=y.values
    trace = pm.sample()

and then for predictions on new data I’m doing…

newdata = pd.read_csv('newdata.csv')
number_of_rows_in_newdata = newdata.shape[0]

new_data_0 = xr.DataArray(

new_data_1 = xr.DataArray(

pred_mean = (
    trace["a"][:number_of_rows_in_newdata] +
    trace["bA"][:number_of_rows_in_newdata] * new_data_0 +
    trace["bB"][:number_of_rows_in_newdata] * new_data_1


predictions = xr.apply_ufunc(lambda mu, sd: rng.normal(mu, sd), pred_mean, trace["sigma"][:number_of_rows_in_newdata])

Is there anything I’m doing wrong here or is this the correct implementation on multivariate regression and subsequent out of sample predictions?


This is a correct implementation of multiple regression which is different from multivariate regression. Multivariate regression typically refers to a multivariate outcome instead of a multivariate predictor. Here, you have a scalar outcome. Otherwise, everything looks fine.


Cool thanks so much for the feedback!