Hi,
I have the below pymc3 model of two correlated random variables, where the correlation is driven by a beta (to be inferred) times an observable.
The goal is to get a data driven correlation variable into the multi-variate normal model.
I’ve tried to specify a mu vector with shape (10, 2), and a vector of 2x2 covariance matrices with length 10. Calling MvNormal with those arguments gives me shape related errors. Looking at the pymc3 code, it appears that this is just not supported.
As a workaround, I’ve implemented the model below, which works and finds the correct beta once sampled. However the sampling process is extremely slow.
Any advice on how to speed this up would be greatly appreciated.
Thanks,
Matt
import numpy as np
import pandas as pd
import pymc3 as pm
import scipy as sp
import theano
from theano import tensor as T
## Generate toy data
## Note in our real world example we only have A_obs, B_obs and drivers avalible
## the goal is to infer beta
N = 10
true_beta = 1.
true_mu = np.array([0, 0])
drivers = 2.0 * np.random.randn(N)
correls = 2.0 * sp.special.expit(drivers*true_beta)-1.0
observations = []
for i in range(N):
covar = np.array([[1, correls[i]],[correls[i], 1]])
observations.append(sp.stats.multivariate_normal.rvs(true_mu, covar, size=1).tolist())
data_df = pd.DataFrame(observations, columns=['A_obs','B_obs'])
data_df['drivers'] = drivers
data_df['correls'] = correls
def corrmat(rho):
return T.fill_diagonal(T.alloc(rho, 2, 2), 1.)
with pm.Model() as model:
beta = pm.Normal('beta', mu=0, sd=10)
driver_vec = beta * data_df['drivers'].values
rho_vec = (2.0 * pm.math.invlogit(driver_vec)) - 1.0
## for each observation, manually construct the correlation matrix and a
## bivariate normal distribution.
for i in range(len(data_df['drivers'].values)):
C = corrmat(rho_vec[i])
data = pm.MvNormal('data_%d' % i,
mu=np.array([0, 0]),
cov=C,
observed=data_df[['A_obs', 'B_obs']].iloc[i, :].values
)