Bayesian Hierarchical Modelling with multiple timepoints for patients as an outcome

Hi,

I’m very new to pymc3 and Bayesian. So sincere apology in advance if this question is too simplistic.

I have a toy data of 10 patients, each patient has 5 time-points, so the model will also have to accomodate the patient grouping. I have 2 predictors and thus 2 slopes. I am trying to build a bayesian hierarchical model that takes time-points as an outcome and grouped by patient unique id. I did not include intercept for this particular instance as I just want to see if my model is able to estimate betas close enough to the true betas I initially set.

The toy data was generated as per:
y = (b1x1) + (b2x2) + e.
where e = randomly generated number; x2 = 2*x1.
I chose the betas - no particular sequence.

My code is below:
data = dummy_data

pat_idx = dummy_data.iloc[:, 0]
timepoint = dummy_data.iloc[:, 1]
predictor = dummy_data.iloc[:, 4:6]
e = dummy_data.iloc[:, 6]

n_x = x.shape[1]
n_pat = len(np.unique(pat_idx))

with pm.Model() as model_dummy:

# Priors

beta = pm.Normal('beta', mu=0, sd=100, shape=(n_pat, n_x, 1))
eps = pm.HalfCauchy('eps', beta=1)

    
# Linear model
tpoint_est = (pm.math.dot(x, beta))[pat_idx]
     
# Data likelihood
y_like = pm.Normal('y_like', mu=tpoint_est, sigma=eps, observed=y)

with model_dummy:
trace = pm.sample(3000, target_accept=0.9)
out = pm.summary(trace)

The code works. However, my biggest issue is that the beta estimates given by the model is way off. I am wondering where I have gone wrong. My guess is perhaps my priors were not set properly. I’d much appreciate any tips/advice/feedback.

Thank you.
PS. Toy data link is below:
https://github.com/mahideel/my-repo/blob/main/dummy_data2_w_e.csv

Hi,

The sd=100 on the Normal seems very broad. But that depends on your data. The link to the CSV you added does not work, so it’s hard to tell.

In general, you can use pm.sample_prior_predictive to make predictions with the model before fitting it to the data. Your models prior predictions should be “realistic”.

cheers

Hi,

Thanks for your feedback. Sorry about the broken link. I’ve attached the .csv file - hopefully it’ll work.

dummy_data2_w_e.csv (2.3 KB)

I will try pm.sample.predictive. My aim is to get the model to provide a good estimation of the beta (or the coefficient). So will definitely try that.

Thank you.