Help Interpreting Hierarchical Linear Regression Results

I am very new to Bayesian data analysis, pymc3, and hierarchical models and I am hoping to use its capabilities to help me understand my data better. I am wondering if you can get help me interpret and check my work.

I am modeling data emitted from a neural network training process, below is a quick summary of the data:

y = log of dev set loss; continuous variable between 0 and inf
x var of interest = % of weights frozen; continuous variable between 0 and 1
cluster: epochs, integers ranging from 1-5 (recoded to 0-4)

So, my model setup looks like:

with pm.Model(coords=coords) as varying_intercept_slope:
    '''
    Varying slope + intercept model
    '''
    freeze_idx = pm.Data("freeze_idx", data.freeze_p, dims="obs_id")
    epoch_idx = pm.Data("epoch_idx", data.epoch, dims="obs_id")

    # hiearchical slope
    a = pm.Normal("a", mu=0.0, sigma=100)
    b = pm.Normal("b", mu=0.0, sigma=100)

    # hiearchical intercept
    sigma_a = pm.Exponential("sigma_a", 5.0)
    sigma_b = pm.Exponential("sigma_b", 5.0)

    # alpha + beta; draw distributions from the above
    a_epoch = pm.Normal("alpha_epoch", mu=a, sigma=sigma_a, shape=epoch_nunique)
    b_epoch = pm.Normal("beta_epoch", mu=b, sigma=sigma_b, shape=epoch_nunique)

    # model error
    eps = pm.Exponential("sigma", 1.0)

    # expected value of y per epoch
    theta = a_epoch[epoch_idx] + b_epoch[epoch_idx] * data.freeze_p.values

    # Data likelihood
    y = pm.Normal("y", theta, sigma=eps, observed=data.log_loss, dims='obs_id')

I am a little lost in the interpretation of the results (below), however.

a seems to represent the hierarchical intercept, but I am not sure how to think about it beyond this point. Looking further, it seems to represent the mean of (y) but I could use some validation here and below.

b seems to represent the hierarchical mu, but I am also not sure how to think about it beyond this point. Looking further, it possibly represents the average effect of my central x variable of interest: the % of frozen weights.

I think the parameters of interest then are alpha_epoch which shows the mass of the most likely parameter values for each intercept of my 5 epoch “clusters”.

Similarly, beta_epoch which shows the mass of the most likely parameter values for each \beta of my 5 epoch “clusters”. The question then is, how to interpret beta_epoch?

az.summary(varying_intercept_slope, round_to=2) tells me that the blue mass, in beta_epoch, is epoch-zero and has a value of 0.14. Would I then interpret it as a typical linear regression estimate? For example: Given that it is epoch-zero, a one-percent increase in the % of weights frozen is associated with a 0.14 increase in the log of the dev set loss, on average.

Lastly, if you could spoil me even more, I am interested in combining data from two different neural network training processes, as they collect the same data, but are based on different training sets. In this case, I think it would be good to model each epoch as its own cluster within its own data set; but allow some sharing of information between the two. Any thoughts on how I might capture the potential of a new binary variable as a higher order cluster if 0 represents data from data set A and 1 represents data from data set B?