I am very new to Bayesian data analysis, pymc3
, and hierarchical models and I am hoping to use its capabilities to help me understand my data better. I am wondering if you can get help me interpret and check my work.
I am modeling data emitted from a neural network training process, below is a quick summary of the data:
y = log of dev set loss; continuous variable between 0 and inf
x var of interest = % of weights frozen; continuous variable between 0 and 1
cluster: epochs, integers ranging from 1-5 (recoded to 0-4)
So, my model setup looks like:
with pm.Model(coords=coords) as varying_intercept_slope:
'''
Varying slope + intercept model
'''
freeze_idx = pm.Data("freeze_idx", data.freeze_p, dims="obs_id")
epoch_idx = pm.Data("epoch_idx", data.epoch, dims="obs_id")
# hiearchical slope
a = pm.Normal("a", mu=0.0, sigma=100)
b = pm.Normal("b", mu=0.0, sigma=100)
# hiearchical intercept
sigma_a = pm.Exponential("sigma_a", 5.0)
sigma_b = pm.Exponential("sigma_b", 5.0)
# alpha + beta; draw distributions from the above
a_epoch = pm.Normal("alpha_epoch", mu=a, sigma=sigma_a, shape=epoch_nunique)
b_epoch = pm.Normal("beta_epoch", mu=b, sigma=sigma_b, shape=epoch_nunique)
# model error
eps = pm.Exponential("sigma", 1.0)
# expected value of y per epoch
theta = a_epoch[epoch_idx] + b_epoch[epoch_idx] * data.freeze_p.values
# Data likelihood
y = pm.Normal("y", theta, sigma=eps, observed=data.log_loss, dims='obs_id')
I am a little lost in the interpretation of the results (below), however.
a
seems to represent the hierarchical intercept, but I am not sure how to think about it beyond this point. Looking further, it seems to represent the mean of (y
) but I could use some validation here and below.
b
seems to represent the hierarchical mu, but I am also not sure how to think about it beyond this point. Looking further, it possibly represents the average effect of my central x
variable of interest: the % of frozen weights.
I think the parameters of interest then are alpha_epoch
which shows the mass of the most likely parameter values for each intercept of my 5 epoch
“clusters”.
Similarly, beta_epoch
which shows the mass of the most likely parameter values for each \beta of my 5 epoch
“clusters”. The question then is, how to interpret beta_epoch
?
az.summary(varying_intercept_slope, round_to=2)
tells me that the blue mass, in beta_epoch
, is epoch-zero and has a value of 0.14
. Would I then interpret it as a typical linear regression estimate? For example: Given that it is epoch-zero, a one-percent increase in the % of weights frozen is associated with a 0.14
increase in the log of the dev set loss, on average.
Lastly, if you could spoil me even more, I am interested in combining data from two different neural network training processes, as they collect the same data, but are based on different training sets. In this case, I think it would be good to model each epoch as its own cluster within its own data set; but allow some sharing of information between the two. Any thoughts on how I might capture the potential of a new binary variable as a higher order cluster if 0
represents data from data set A and 1
represents data from data set B?