Question about multi level structure for likelihood

I am working on a modeling problem where the regression is perfectly suited for a hierarchical structure. However, after inspecting the data, it seems that the response variable has different mean and standard deviation for each group. This might be a straightforward problem, but I wondering how to extend the multilevel structure to the noise in the likelihood. e.g., if we have some x-data and some y-data with four groups, we might model like likelihood as:

intercept = pm.Normal(..., dims=groups)
slope     = pm.Normal(..., dims=groups)

mu = intercept[group] + slope[group]*x-data

err = pm.HalfNormal(...)

likelihood = pm.Normal(..., mu = mu, sigma = err, dims="obs_id")

This works fine in general, but it assumes a single noise param for all groups. Is it as simple as changing the noise to be something like:

err = pm.__SOME_DISTRIBUTION__('err', .... , dims=groups)

I wrote a full working example with simulated data. This question is based off of a problem that I am actually working on, but for simplicity in understanding, I am working with the simulated set. I was able to get the above approach to work, but (a) I am not sure if this is a correct approach, and (b) how to appropriately model the noise when I need prior and hyperprior.

Here is the notebook on colab. I thought it was clearer than trying to fit all the code/figures into one long post.

Thanks!

You can use a hierarchical normal prior and exponentiate it. Or a LogNormal which is exactly the same.

Great, thanks for the reply! So you mean something like this:

# Normal prior with exponential
err_mu     = pm.Normal("err_mu", mu=0, sigma=10)
err_sigma  = pm.HalfNormal("err_sigma", 1)
err        = pm.math.exp(pm.Normal("err", mu=err_mu, sigma=err_sigma, dims=("group"))

# Log normal for all?
err_mu    = pm.LogNormal("err_mu", mu=0, sigma=0.25)
err_sigma = pm.LogNormal("err_sigma", mu=0, sigma=0.25)
err       = pm.LogNormal("err", mu=err_mu, sigma=err_sigma, dims=("group"))

# Mean of distribution
mu = intercept[groupidx] + slope[groupidx]*xdata
    
# Likelihood
y_likelihood = pm.Normal("y_likelihood", mu=mu, sigma=err[groupidx], observed=ydata, dims="obs_id")

or am I getting it entirely wrong?

On a more fundamental note, are there any suggested resources for how to model noise? Is it something that really makes a difference? EDIT: what I mean by makes a difference, for some processes we may not have any idea what generates the noise in the likelihood, so in this case do we just want to test distributions until our model achieves good performance?

UPDATE:

I tried both

    # Log normal distribution for lambda param
    sigma_err  = pm.Lognormal('sigm_err',mu=0, sigma=0.5)
    # Exponentional distribution for likelihood noise
    err        = pm.Exponential("err",lam=sigma_err,dims=("group"))

   # All log normal
    err_mu    = pm.LogNormal("err_mu", mu=0, sigma=0.25)
    err_sigma = pm.LogNormal("err_sigma", mu=0, sigma=0.25)
    err       = pm.LogNormal("err", mu=err_mu, sigma=err_sigma, dims=("group"))

and both worked w/o divergence. Both converged on nearly the same solution, but neither quite accurately captures the true likelihood noise. I also tried

err_mu     = pm.Normal("err_mu", mu=0, sigma=10)
err_sigma  = pm.HalfNormal("err_sigma", 1)
err        = pm.math.exp(pm.Normal("err", mu=err_mu, sigma=err_sigma, dims=("group")))

and that failed miserably with hundreds of divergences.

Any input on this? Truthfully, the only intuition I have is that the noise should be positive. Otherwise, I am sort of guessing as to how to select and appropriate distribution.