Unexpected Lack of Convergence of Model with Multicollinearity

Edderic · May 24, 2018, 2:40pm

Hi!

I’m working on Richard Mcelreath’s book Statistical Rethinking for self-study and I’m answering end of chapter exercises using PyMC3.

One question involves creating a model with highly correlated data. The idea is to predict height with the left leg and the right leg. Data was generated as follows:

height = np.random.normal(loc=10, scale=2, size=100)
leg_prop = np.random.uniform(low=0.4, high=0.5, size=100)
leg_left = leg_prop * height + np.random.normal(loc=0, scale=0.02, size=100)
leg_right = leg_prop * height + np.random.normal(loc=0, scale=0.02, size=100)

highly_corr_df = pd.DataFrame({
    'height': height,
    'leg_left': leg_left,
    'leg_right': leg_right
})

The first model is simple and converges:

with pm.Model() as highly_corr_model:
    bL = pm.Normal('bL', 2, 10)
    bR = pm.Normal('bR', 2, 10)
    alpha = pm.Normal('alpha', 10, 100)
    sigma = pm.HalfCauchy('sigma', 1)
    mu = bL * highly_corr_df['leg_left'] + bR * highly_corr_df['leg_right'] + alpha
    height = pm.Normal('height', mu, sigma, observed=highly_corr_df['height'])

    trace_highly_corr_model = pm.sample(4000, tune=1000, start={
        'bL': 0,
        'bR': 0,
        'alpha': 10,
        'sigma': 1
    })

However, the question asks what happens if bR was only positive. Here’s the R equivalent:

m5.8s2 <- map2stan(
    alist(
        height ~ dnorm(mu, sigma),
        mu <- a + bl*leg_left + br*leg_right,
        a ~ dnorm(10,100),
        bl ~ dnorm(2, 10),
        br ~ dnorm(2, 10) & T[0,],
        sigma ~ dcauchy(0, 1),
    ),
    data=d, chains=4,
    start=list(a=10, bl=0, br=0, sigma=1)
)

Here’s my code that uses Bound to ensure that bR can only be positive:

with pm.Model() as highly_corr_model_truncated:
    bL = pm.Normal('bL', 2, 10, testval=0)
    BoundedNormal = pm.Bound(pm.Normal, lower=0.0)
    bR = BoundedNormal('bR', mu=2, sd=10, testval=0.1) 
    alpha = pm.Normal('alpha', 10, 100, testval=10)
    sigma = pm.HalfCauchy('sigma', 1, testval=1)
    mu = bL * highly_corr_df['leg_left'] + bR * highly_corr_df['leg_right'] + alpha
    height = pm.Normal('height', mu, sigma, observed=highly_corr_df['height'])

    trace_highly_corr_model_truncated = pm.sample(4000, tune=1000)

This model has convergence issues: !

Any ideas why this is happening? I don’t know if this is a bug or if I wrote code incorrectly. Any help would be appreciated! Thanks.

junpenglao · May 24, 2018, 8:50pm

The implementation looks correct to me, so the unidentifiablitly might be an issue of the model itself.
Did you try to fit the same model/data in R using rethinking?

Topic		Replies	Views
Convergence with noncentered hierarchical model Questions	6	769	July 24, 2021
Random Intercepts and Slopes (Correlated) Convergence Issues Questions	16	1905	March 11, 2019
Hierarchical Linear Regression Model -- why is it not converging? Questions	3	1539	March 14, 2018
Fitting a complex model with a large dataset: chain contains only diverging samples Questions	5	1231	December 12, 2019
Hierarchical model where convergence is difficult: chains look bad. Any tips? Questions	6	985	January 10, 2019

Unexpected Lack of Convergence of Model with Multicollinearity

Related topics