Hello everyone,
I am new to pymc3 and I’d like to model some time series data with a hierarchical model.
Basically, I used pymc3 to derive parameters of a sum of exponential functions that describe my observed data over time. This part is working correctly if i provide one data set at a time.
with pm.Model() as model_biexpo:
ke = pm.Exponential('ke', lam=1)
ka = pm.Exponential('ka', lam=50)
c = pm.Exponential('c', lam=1e-4)
epsilon = pm.HalfNormal('epsilon',sigma=5)
measures = pm.Deterministic('measures', bi_exponential(ke, ka, c, t))
y_like = pm.Normal('y_like', mu=measures, observed=observed, sd=epsilon)
dec_trace = pm.sample(15000, progressbar=False, tune=8000)
burned_dec_trace = dec_trace[8000:]
where
observed
& t
are measured data and time of measurement respectively. So far so good. results are great.
Following this document & this one, I try to implement a hierarchical model to fit several dataset at the same time.
For my first try, i don’t want to look after hyperparameters in my model, I want to stick to the unpooled model like in radon example.
My data (as a pandas Dataframe) are arranged like so:
hours | measures | |
---|---|---|
code | ||
0 | 3.95 | 6.044405e-06 |
0 | 22.25 | 2.673205e-06 |
0 | 46.07 | 1.961030e-06 |
0 | 70.81 | 1.262165e-06 |
0 | 142.49 | 7.512900e-07 |
1 | 2.00 | 1.355847e-05 |
1 | 26.04 | 1.151494e-06 |
1 | 49.77 | 8.385432e-07 |
2 | to continue with some other data... |
For each code, several time and measurements. Note that the number of time points are not necessary the same.
I don’t how to organize my data to fit into the model derived from here:
with pm.Model() as unpooled_model_biexpo:
ke = pm.Exponential('ke', lam=1, shape=nb_sbjct)
ka = pm.Exponential('ka', lam=50, shape=nb_sbjct)
c = pm.Exponential('c', lam=1e-4, shape=nb_sbjct)
t = theano.tensor.dcol('t')
epsilon = pm.HalfNormal('epsilon',sigma=5)
tps_measures =data.loc[:,'hours'].values.reshape(len(idx_pat),1).astype('double')
measures = pm.Deterministic('measures', tt.function([ke, ka, c, t],
bi_exponential(ke, ka, c, tps_measures), on_unused_input='warn'))
y_like = pm.Normal('y_like', mu=measures, observed=data.loc[:,'measures'], sd=epsilon)
dec_trace = pm.sample(15000, progressbar=False, tune=8000)
burned_dec_trace = dec_trace[8000:]
if I understand correctly nb_subjct
should be the number of different code in my dataset, say: nb_suject=len(data.index.unique())
.
I don’t figure out how to handle the measurements properly as the number of data point vary from subject to subject. Despite my several attempts, I didn’t find a way to properly encode this information for each subject.
Any help would be greatly appreciated.
Best regards
LF