How does 'tune' parameter impact Rhat values?

Greetings all,

I’m struggling with the fluctuating Rhat values. As far as I know, the default value for the tune is 1000, yet the problem is, the higher the tune value, the higher Rhat outcome (particularly with random effects) I get or vice versa. Is this a problem or am I missing something? The value worked so far is 100. I get slightly similar outputs with tune = 50.

Also accepted value for Rhat was initially 1.05, yet later, Vehtari et al. stated it would be better if Rhat < 1.01. How should I deal with values like 1.03?

# Updated list of tense-aspect structures
structures = ['Past_Perf_Prog', 'Past_Prog', 'Present_Simple', 'Present_Perf',
              'Present_Perf_Prog', 'Present_Prog', 'Past_Simple', 'Past_Perf']

# CEFR levels paired for modeling
cefr_pairs = [('A1', 'A2'), ('B1', 'B2'), ('C1',)]

with pm.Model() as frequency_model:
    # Hyperprior for global mean
    mu_alpha = pm.Normal('mu_alpha', mu=0., sigma=10)
    
    # Random intercepts for native_language
    alpha_native_lang = pm.Normal('alpha_native_lang', mu=0, sigma=1, shape=len(native_language_mapping))

    # Random intercepts for aspect_combined
    alpha_aspect = pm.Normal('alpha_aspect', mu=0, sigma=1, shape=len(aspect_mapping))

    # Fixed effects for CEFR levels with custom priors
    beta_cefr_A1 = pm.Normal('beta_cefr_A1', mu=0, sigma=0.25)  # Updated strong prior for A1
    beta_cefr_A2 = pm.Normal('beta_cefr_A2', mu=0, sigma=0.25)  # Updated strong prior for A2
    beta_cefr_B1 = pm.Normal('beta_cefr_B1', mu=0, sigma=0.5)   # Updated moderate prior for B1
    beta_cefr_B2 = pm.Normal('beta_cefr_B2', mu=0, sigma=0.5)   # Updated moderate prior for B2
    beta_cefr_C1 = pm.Normal('beta_cefr_C1', mu=0, sigma=0.75)  # Updated weak to moderate prior for C1

    # Interaction terms for each tense-aspect structure and CEFR level
    for structure in structures:
        for cefr_pair in cefr_pairs:
            for level in cefr_pair:
                if level in ['A1', 'A2']:
                    if structure in ['Present_Simple', 'Present_Prog', 'Past_Simple']:
                        sigma_value = 0.25  # High confidence for basic structures
                    elif structure in ['Present_Perf', 'Past_Perf', 'Past_Perf_Prog']:
                        sigma_value = 1.5  # Low confidence for advanced structures
                    else:
                        sigma_value = 1.0  # Moderate confidence for other structures
                elif level in ['B1', 'B2']:
                    if structure in ['Present_Perf', 'Past_Prog', 'Past_Perf']:
                        sigma_value = 0.5  # Moderate confidence for intermediate structures
                    else:
                        sigma_value = 0.75  # Default for other structures at these levels
                else:  # For C1
                    sigma_value = 0.75  # Moderate confidence for all structures at advanced level
                
                locals()[f"beta_{level}_{structure}"] = pm.Normal(f'beta_{level}_{structure}', mu=0, sigma=sigma_value)

    # Expected value for the target variable (frequency of each structure)
    mu = mu_alpha + alpha_native_lang[grouped_data['native_language_idx'].values] + alpha_aspect[grouped_data['aspect_idx'].values]

    # Adding fixed effects for each CEFR level
    for level in ['A1', 'A2', 'B1', 'B2', 'C1']:
        mu += locals()[f"beta_cefr_{level}"] * (grouped_data['cefr_idx'].values == cefr_mapping[level])

    # Adding interaction terms
    for structure in structures:
        for level in ['A1', 'A2', 'B1', 'B2', 'C1']:
            mu += locals()[f"beta_{level}_{structure}"] * (grouped_data['cefr_idx'].values == cefr_mapping[level]) * (grouped_data['structure'] == structure)

    # Likelihood (Poisson distribution for count/frequency data)
    obs = pm.Poisson('obs', mu=np.exp(mu), observed=grouped_data['frequency'])
    
    # Sampling from the model (adjust the number of samples as needed)
    idata = pm.sample(draws = 2000, tune = 100, chains = 4, random_seed = 42, progressbar = True, nuts_sampler = 'pymc',
                  discard_tuned_samples=True, compute_convergence_checks=True,return_inferencedata=True)
||mean|sd|hdi_3%|hdi_97%|mcse_mean|mcse_sd|ess_bulk|ess_tail|r_hat|
|---|---|---|---|---|---|---|---|---|---|
|mu_alpha|4.791|0.643|3.574|5.998|0.045|0.032|206.0|443.0|1.01|
|alpha_native_lang[0]|-0.752|0.313|-1.335|-0.154|0.019|0.013|270.0|523.0|1.03|
|alpha_native_lang[1]|-0.271|0.313|-0.864|0.318|0.019|0.013|270.0|519.0|1.03|
|alpha_native_lang[2]|0.120|0.313|-0.456|0.726|0.019|0.013|270.0|522.0|1.03|
|alpha_native_lang[3]|-0.198|0.313|-0.766|0.417|0.019|0.013|270.0|528.0|1.03|
|alpha_native_lang[4]|-0.985|0.313|-1.575|-0.391|0.019|0.013|270.0|526.0|1.03|
|alpha_native_lang[5]|1.320|0.313|0.734|1.916|0.019|0.014|270.0|526.0|1.03|
|alpha_native_lang[6]|1.682|0.313|1.100|2.283|0.019|0.014|270.0|528.0|1.03|
|alpha_native_lang[7]|0.298|0.313|-0.290|0.891|0.019|0.013|270.0|524.0|1.03|
|alpha_native_lang[8]|0.215|0.313|-0.373|0.810|0.019|0.013|270.0|526.0|1.03|
|alpha_native_lang[9]|-1.334|0.313|-1.930|-0.749|0.019|0.013|270.0|517.0|1.03|
|alpha_aspect[0]|1.430|0.527|0.509|2.509|0.038|0.027|198.0|451.0|1.01|
|alpha_aspect[1]|-2.101|0.527|-3.018|-1.021|0.038|0.027|198.0|453.0|1.01|
|alpha_aspect[2]|-1.960|0.527|-2.874|-0.881|0.038|0.027|198.0|439.0|1.01|
|alpha_aspect[3]|2.643|0.527|1.722|3.722|0.038|0.027|198.0|450.0|1.01|

Well, it seems like the issue was due to overdispersion. The issue is now solved.

1 Like

Could you post the solution you landed on? I’m sure some future visitor would be interested!

Sure thing! Since the reason is overdispersion, I’ve used NegativeBinomial rather than Poisson.

So I’ve added the prior for the alpha parameter,

# Prior for the dispersion parameter
    alpha = pm.Gamma('alpha', alpha=2, beta=0.1)

and then run pm.NegativeBinomial()

obs = pm.NegativeBinomial('obs', mu=np.exp(mu), alpha=alpha, observed=grouped_data['frequency'])

The final complete model is like this;

with pm.Model() as frequency_model:
    # Hyperprior for global mean
    mu_alpha = pm.Normal('mu_alpha', mu=0., sigma=10)
    
    # Random intercepts for native_language
    alpha_native_lang = pm.Normal('alpha_native_lang', mu=0, sigma=1, shape=len(native_language_mapping))

    # Random intercepts for aspect_combined
    alpha_aspect = pm.Normal('alpha_aspect', mu=0, sigma=1, shape=len(aspect_mapping))

    # Fixed effects for CEFR levels with custom priors
    beta_cefr_A1 = pm.Normal('beta_cefr_A1', mu=0, sigma=0.25)  # Updated strong prior for A1
    beta_cefr_A2 = pm.Normal('beta_cefr_A2', mu=0, sigma=0.25)  # Updated strong prior for A2
    beta_cefr_B1 = pm.Normal('beta_cefr_B1', mu=0, sigma=0.5)   # Updated moderate prior for B1
    beta_cefr_B2 = pm.Normal('beta_cefr_B2', mu=0, sigma=0.5)   # Updated moderate prior for B2
    beta_cefr_C1 = pm.Normal('beta_cefr_C1', mu=0, sigma=0.75)  # Updated weak to moderate prior for C1

    # Interaction terms for each tense-aspect structure and CEFR level
    for structure in structures:
        for cefr_pair in cefr_pairs:
            for level in cefr_pair:
                if level in ['A1', 'A2']:
                    if structure in ['Present_Simple', 'Present_Prog', 'Past_Simple']:
                        sigma_value = 0.25  # High confidence for basic structures
                    elif structure in ['Present_Perf', 'Past_Perf', 'Past_Perf_Prog']:
                        sigma_value = 1.5  # Low confidence for advanced structures
                    else:
                        sigma_value = 1.0  # Moderate confidence for other structures
                elif level in ['B1', 'B2']:
                    if structure in ['Present_Perf', 'Past_Prog', 'Past_Perf']:
                        sigma_value = 0.5  # Moderate confidence for intermediate structures
                    else:
                        sigma_value = 0.75  # Default for other structures at these levels
                else:  # For C1
                    sigma_value = 0.75  # Moderate confidence for all structures at advanced level
                
                locals()[f"beta_{level}_{structure}"] = pm.Normal(f'beta_{level}_{structure}', mu=0, sigma=sigma_value)

    # Expected value for the target variable (frequency of each structure)
    mu = mu_alpha + alpha_native_lang[grouped_data['native_language_idx'].values] + alpha_aspect[grouped_data['aspect_idx'].values]

    # Adding fixed effects for each CEFR level
    for level in ['A1', 'A2', 'B1', 'B2', 'C1']:
        mu += locals()[f"beta_cefr_{level}"] * (grouped_data['cefr_idx'].values == cefr_mapping[level])

    # Adding interaction terms
    for structure in structures:
        for level in ['A1', 'A2', 'B1', 'B2', 'C1']:
            mu += locals()[f"beta_{level}_{structure}"] * (grouped_data['cefr_idx'].values == cefr_mapping[level]) * (grouped_data['structure'] == structure)
            
    # Prior for the dispersion parameter
    alpha = pm.Gamma('alpha', alpha=2, beta=0.1)

    # Likelihood (NegativeBinomial distribution for count/frequency data)
    obs = pm.NegativeBinomial('obs', mu=np.exp(mu), alpha=alpha, observed=grouped_data['frequency'])

    
    # Sampling from the model (adjust the number of samples as needed)
    idata = pm.sample(draws = 2000, tune = 1000, chains = 4, random_seed = 42, progressbar = True, nuts_sampler = 'pymc',
                  discard_tuned_samples=True, compute_convergence_checks=True,return_inferencedata=True)
2 Likes