I am trying to perform a linear regression with 10 independent and 3 dependent variables. For every sample, the sum of independent variables equals one and so does the sum of the dependent variables. Data is divided into 5 groups. I am trying to build a no-pooling model with independent inference for each group. While I can build individual models for each group, I thought of following the method of group_indexes in the likelihood function so that I can further build hyper-priors and test a hierarchical model. The coefficients of linear regressions are also bounded to [0, 1] so a Dirichlet prior was the option. A Dirichlet prior also lets me simultaneously model the 3 dependent variables that add up to one (this is feasible as my independent variables also add up to one) My model looks like this.
from scipy import stats
import numpy as np
import pymc3 as pm
n_features = 10
dependent_count = 3
group_count = 5
x = stats.dirichlet(a=np.ones(n_features)).rvs(1000)
y = stats.dirichlet(a=np.ones(dependent_count)).rvs(1000)
idx = np.random.randint(group_count, size=1000) # sample group numbers
with pm.Model() as md:
# prior
α = pm.Dirichlet('α', a=[1]*dependent_count,
shape=(5, 10, 3))
# taking a dot product of α for each group with the data
t = []
for i in range(x.shape[0]):
t.append(pm.math.dot(x[i], α[idx[i]]))
t_ = tt.stacklists(t)
ϵ = pm.HalfCauchy('ϵ', 2, shape=(5,3))
#likelihood
y_pred = pm.Normal('y_pred', mu=t_ ,sd=ϵ[idx], observed=y)
trace_h = pm.sample(1000, chains=2)
Now this takes forever to run and is stuck at “Initializing NUTS using jitter+adapt_diag”. I’m assuming the way in which I’m creating t_ is wrong. When passing a numpy array, I got an error : ValueError: setting an array element with a sequence. Please keep in mind that the dot product is of the coefficients and independent variables for each group hence the for loop.