Hi,
I’m very new to pymc3 and Bayesian. So sincere apology in advance if this question is too simplistic.
I have a toy data of 10 patients, each patient has 5 time-points, so the model will also have to accomodate the patient grouping. I have 2 predictors and thus 2 slopes. I am trying to build a bayesian hierarchical model that takes time-points as an outcome and grouped by patient unique id. I did not include intercept for this particular instance as I just want to see if my model is able to estimate betas close enough to the true betas I initially set.
The toy data was generated as per:
y = (b1x1) + (b2x2) + e.
where e = randomly generated number; x2 = 2*x1.
I chose the betas - no particular sequence.
My code is below:
data = dummy_data
pat_idx = dummy_data.iloc[:, 0]
timepoint = dummy_data.iloc[:, 1]
predictor = dummy_data.iloc[:, 4:6]
e = dummy_data.iloc[:, 6]
n_x = x.shape[1]
n_pat = len(np.unique(pat_idx))
with pm.Model() as model_dummy:
# Priors
beta = pm.Normal('beta', mu=0, sd=100, shape=(n_pat, n_x, 1))
eps = pm.HalfCauchy('eps', beta=1)
# Linear model
tpoint_est = (pm.math.dot(x, beta))[pat_idx]
# Data likelihood
y_like = pm.Normal('y_like', mu=tpoint_est, sigma=eps, observed=y)
with model_dummy:
trace = pm.sample(3000, target_accept=0.9)
out = pm.summary(trace)
The code works. However, my biggest issue is that the beta estimates given by the model is way off. I am wondering where I have gone wrong. My guess is perhaps my priors were not set properly. I’d much appreciate any tips/advice/feedback.
Thank you.
PS. Toy data link is below:
https://github.com/mahideel/my-repo/blob/main/dummy_data2_w_e.csv