Hi there!
I’m trying to get a multiple regression off the ground, but I’m running into some problems.
The general structure of the model is like so:
y_{i,1} = \mathbf{x}_{i,1} \cdot \mathbf{\beta}_{1}' + \epsilon_{i,1}
y_{i,2} = \mathbf{x}_{i,2} \cdot \mathbf{\beta}_{2}' + \epsilon_{i,2}
dim(\mathbf{x}_{i,1})=dim(\mathbf{\beta}_{i,1})=(k_1,1)
dim(\mathbf{x}_{i,2})=dim(\mathbf{\beta}_{i,2})=(k_2,1)
dim(y_{i,1})=dim(y_{i,2})=(1,1)
\bf{Likelihood}
\mathbf{\epsilon}_i = (\epsilon_{i,1},\epsilon_{i,2}) \stackrel{iid}{\sim} MVN(mean=\mathbf{0}_2,cov=\mathbf{\Sigma_\epsilon})
\bf{Priors}
(\beta_{1,1},...,\beta_{{k_1},1}) \stackrel{iid}{\sim} Normal(mean=0,stdev=10)
(\beta_{1,2},...,\beta_{{k_2},2}) \stackrel{iid}{\sim} Normal(mean=0,stdev=10)
\mathbf{sd}=(sd_1,sd_2) \stackrel{iid}{\sim} HalfCauchy(beta=2.5)
Chol(\mathbf{\Sigma}_\epsilon) \sim LKJCholeskyCov(eta=2, n=2, sd=\mathbf{sd})
In the example I’m giving above, I have 2 regression lines, \mathbf{y}_1 which is explained by variables \mathbf{x}_1 and \mathbf{y}_2 which is explained by variables \mathbf{x}_2. Notice that this form is general enough that \mathbf{x}_1 and \mathbf{x}_2 can be constituted by completely different columns.
What is making my life a bit difficult is translating this into PyMC3. Let me show you my example using some fake data
import pymc3 as pm
import numpy as np
import pandas as pd
import theano
import theano.tensor as tt
# Reading in dataset
dataset = pd.read_csv("fake_multiple_multivariate_regression_data_covmtx.csv")
num_rows = len(dataset)
# Naming the dependent categorical variables
dep_vars = ["y1_cont","y2_cont"]
num_dep_vars = len(dep_vars)
# Exogenous variables impacting each depvar.
indep_vars = [["uno","x1","x2","x3","x4"],
["uno","x3","x4","x5","x6","x7"]]
indep_vars_y1 = len(indep_vars[0])
indep_vars_y2 = len(indep_vars[1])
x_data_y1 = dataset[indep_vars[0]].values
x_data_y2 = dataset[indep_vars[1]].values
y_data = dataset[dep_vars].values
with pm.Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
# Priors for betas
betas_y1 = pm.Normal('Betas_Y1', 0, sd=20, shape=indep_vars_y1)
betas_y2 = pm.Normal('Betas_Y2', 0, sd=20, shape=indep_vars_y2)
# Priors for error covariance matrix
sd_dist = pm.HalfCauchy.dist(beta=2.5, shape=num_dep_vars)
packed_chol = pm.LKJCholeskyCov('chol_cov', eta=2, n=num_dep_vars, sd_dist=sd_dist)
chol = pm.expand_packed_triangular(num_dep_vars, packed_chol, lower=True)
y1_hat = tt.dot(x_data_y1, betas_y1)
y2_hat = tt.dot(x_data_y2, betas_y2)
# Here is my main problem!!!
mu = pm.Deterministic('mu', tt.stack([y1_hat,y2_hat]))
# Define a new MvNormal with the given covariance
vals = pm.MvNormal('vals', mu=mu, chol=chol, observed=y_data)
In the example above, I’m having a hard time re-packing y1_hat
and y2_hat
in a single array-type element so I can use the MvNormal
.
Does anyone have any insight on how to do so?
Thank you!
.