Non-normalized data in a GLM how to handle?

Hello,

I am currently facing a normalization issue with a simple GLM modeling problem. My linear model is fairly simple, but I am struggling to fit two different countries that have a significant difference in the outcome variable.

To clarify, my model looks like this:

\text{income} = \beta_{0,country} + \beta_{1, country} * \text{years_of_education}

The code looks like this

import numpy as np
import pymc as pm
from scipy import stats
import matplotlib.pyplot as plt
import arviz as az

np.random.seed(0)
n_samples = 1000
education_years = np.random.normal(10, 2, (n_samples,2))
b1_real = np.array([-1,5])
b0_real = np.array([5_000, 5])
y =  b0_real  + b1_real*education_years + stats.halfnorm.rvs(0, 20,(n_samples,2)) #income

with pm.Model() as model_country:
b0 = pm.Normal("b0", mu=0, sigma=10, shape=2)
b1 = pm.Normal("b1", mu=0, sigma=10, shape=2)
sigma = pm.HalfNormal("sigma", sigma=10, shape=2)

mu = b0 + b1 * education_years
inc_mu = pm.Normal("inc_mu", mu=mu, sigma=sigma, observed=y, shape=y.shape)

trace = pm.sample(draws=500, cores=4, target_accept=0.80, )
posterior_predictive = pm.sample_posterior_predictive(trace, var_names=['b0', 'b1', 'inc_mu', 'sigma'])
map = pm.find_MAP()

az.plot_forest(trace, combined=True, hdi_prob=0.95)


The problem is that the sampler struggles a lot to find the original parameters (b0_real, b1_real ). I intentionally selected numbers that are fairly different because they differ significantly on the â€śrealâ€ť example that Iâ€™m working on

I have learned that normalization is typically a necessary step in data modeling. However, in my specific case, I want to avoid normalization due to certain experimental values and constraints that would affect the simplicity of my results. Instead, I would like to keep the betas unnormalized.

Is there any way to transform the model so that it can handle non-normalized datasets, such as the one in my example? Perhaps changing the NUTS sampler would be the best alternative?. Alternatively, would adding informative priors help resolve the issue? Maybe performing a linear regression first and using those parameters to inform the prior? I would greatly appreciate any guidance you can provide.

b0 = pm.Normal("b0", mu=np.mean(y, axis=0), sigma=10, shape=2)