I hope someone can help me please.
I’m fitting Bayesian Multiple Linear Regression with 7 features.
Some of the features are very low correlated with target variable.
with pm.Model() as model_mlr_normal:
# Intercept
alpha = pm.Normal(‘alpha’, mu=0, sd=25)
# Slope
beta = pm.Normal('beta', mu=0, sd=25, shape= len(data.columns[:-1]))
# Error term
eps = pm.HalfCauchy('eps', 25)
# Expected value of outcome (MLR with vectors)
mu = alpha + pm.math.dot(x, beta)
# Likelihood
tune_in_i = pm.Normal('tune_in_i', mu= mu, sd= eps, observed= y)
# posterior/create the race
trace_normal = pm.sample(chains= 4)
I already tried to change mu,sd for Slope and Intercept with diff. variations. Also changed tune to 3000 and more and I still getting DIVERGENCES. is it bad thing to get divergences ???
100.00% [8000/8000 02:58<00:00 Sampling 4 chains, 8 divergences]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 226 seconds.
There were 2 divergences after tuning. Increase target_accept or reparameterize.
There were 4 divergences after tuning. Increase target_accept or reparameterize.
There were 2 divergences after tuning. Increase target_accept or reparameterize.
It works !!! Thank you for your help
trace_normal = pm.sample(target_accept = 0.9)
I saw it on forums before but didn’t pay attention that much.
Have another question if you familiar :
I have 2 features that highly correlated (0.82 - corr). Does Multicollinearity effects the model and should I exclude one of the feature that highly corr? (This is what I remember from Regular Linear Regression but don’t know if it is the same with pyMC3)
Multicollinearity will certainty effect the inference you make. However, unlike in frequentist settings, the sampling process will not blow up. Instead, the ambiguity inherent in estimating coefficients of collinear predictors will show up as dependence in the posterior. See here for an example in which 1 predictor is entered into a model twice (e.g., yielding 2 maximally collinear predictors). Estimation goes just fine, but there’s lots of ambiguity about the values of these parameters (as there should be).