This looks mostly right to me although the model needs a few alterations. I’ve made some changes below:
Here’s the part for generating fake data:
import numpy as np
np.random.seed(123)
n = 50
p = 10 # Number of coefficients
# Create PxN matrix of covariates
X = np.random.uniform(-1,1,size=(p,n))
X_std = 0.01 + np.random.exponential(0.05,size=(p,n))
# Create vector of coefficients with P elements
coefs = np.random.randn(p)
# Create observation with non-measurement noise added
y = coefs.dot(X) + np.random.normal(0, 0.15,n)
# Create observation measurement noise
y_std = 0.01 + np.random.exponential(0.05, n)
Then here’s the PyMC3 model. I rearranged the order of the model variables a bit and changed the observed keyword to instead just using all latent variables. This makes it a little easier to understand in my opinion. We model some true but unobserved y_true which is then used in the standard likelihood calculation (and which should incorporate sigma, not y_std).
import pymc3 as pm
with pm.Model() as m_toy:
X_pm = pm.Normal('X_pm', mu=X, sd=X_std,shape=(p,n))
coefs_pm = pm.Normal('coefs_pm',mu=0.,sd=10,shape=p)
mu = pm.math.dot(coefs_pm,X_pm)
sigma = pm.Exponential('sigma',1)
y_true = pm.Normal('y_true',y, y_std, shape=n)
likelihood = pm.Normal('likelihood', mu-y_true, sigma, shape=n)
trace_toy = pm.sample(1000)
Using this setup, I was able to draw about 60 samples / s across 4 cores though some of the chains had divergences. I’m interested to see how other people would solve this problem.