How does 'tune' parameter impact Rhat values?

fatihbozdag · December 10, 2023, 8:39pm

Greetings all,

I’m struggling with the fluctuating Rhat values. As far as I know, the default value for the tune is 1000, yet the problem is, the higher the tune value, the higher Rhat outcome (particularly with random effects) I get or vice versa. Is this a problem or am I missing something? The value worked so far is 100. I get slightly similar outputs with tune = 50.

Also accepted value for Rhat was initially 1.05, yet later, Vehtari et al. stated it would be better if Rhat < 1.01. How should I deal with values like 1.03?

# Updated list of tense-aspect structures
structures = ['Past_Perf_Prog', 'Past_Prog', 'Present_Simple', 'Present_Perf',
              'Present_Perf_Prog', 'Present_Prog', 'Past_Simple', 'Past_Perf']

# CEFR levels paired for modeling
cefr_pairs = [('A1', 'A2'), ('B1', 'B2'), ('C1',)]

with pm.Model() as frequency_model:
    # Hyperprior for global mean
    mu_alpha = pm.Normal('mu_alpha', mu=0., sigma=10)
    
    # Random intercepts for native_language
    alpha_native_lang = pm.Normal('alpha_native_lang', mu=0, sigma=1, shape=len(native_language_mapping))

    # Random intercepts for aspect_combined
    alpha_aspect = pm.Normal('alpha_aspect', mu=0, sigma=1, shape=len(aspect_mapping))

    # Fixed effects for CEFR levels with custom priors
    beta_cefr_A1 = pm.Normal('beta_cefr_A1', mu=0, sigma=0.25)  # Updated strong prior for A1
    beta_cefr_A2 = pm.Normal('beta_cefr_A2', mu=0, sigma=0.25)  # Updated strong prior for A2
    beta_cefr_B1 = pm.Normal('beta_cefr_B1', mu=0, sigma=0.5)   # Updated moderate prior for B1
    beta_cefr_B2 = pm.Normal('beta_cefr_B2', mu=0, sigma=0.5)   # Updated moderate prior for B2
    beta_cefr_C1 = pm.Normal('beta_cefr_C1', mu=0, sigma=0.75)  # Updated weak to moderate prior for C1

    # Interaction terms for each tense-aspect structure and CEFR level
    for structure in structures:
        for cefr_pair in cefr_pairs:
            for level in cefr_pair:
                if level in ['A1', 'A2']:
                    if structure in ['Present_Simple', 'Present_Prog', 'Past_Simple']:
                        sigma_value = 0.25  # High confidence for basic structures
                    elif structure in ['Present_Perf', 'Past_Perf', 'Past_Perf_Prog']:
                        sigma_value = 1.5  # Low confidence for advanced structures
                    else:
                        sigma_value = 1.0  # Moderate confidence for other structures
                elif level in ['B1', 'B2']:
                    if structure in ['Present_Perf', 'Past_Prog', 'Past_Perf']:
                        sigma_value = 0.5  # Moderate confidence for intermediate structures
                    else:
                        sigma_value = 0.75  # Default for other structures at these levels
                else:  # For C1
                    sigma_value = 0.75  # Moderate confidence for all structures at advanced level
                
                locals()[f"beta_{level}_{structure}"] = pm.Normal(f'beta_{level}_{structure}', mu=0, sigma=sigma_value)

    # Expected value for the target variable (frequency of each structure)
    mu = mu_alpha + alpha_native_lang[grouped_data['native_language_idx'].values] + alpha_aspect[grouped_data['aspect_idx'].values]

    # Adding fixed effects for each CEFR level
    for level in ['A1', 'A2', 'B1', 'B2', 'C1']:
        mu += locals()[f"beta_cefr_{level}"] * (grouped_data['cefr_idx'].values == cefr_mapping[level])

    # Adding interaction terms
    for structure in structures:
        for level in ['A1', 'A2', 'B1', 'B2', 'C1']:
            mu += locals()[f"beta_{level}_{structure}"] * (grouped_data['cefr_idx'].values == cefr_mapping[level]) * (grouped_data['structure'] == structure)

    # Likelihood (Poisson distribution for count/frequency data)
    obs = pm.Poisson('obs', mu=np.exp(mu), observed=grouped_data['frequency'])
    
    # Sampling from the model (adjust the number of samples as needed)
    idata = pm.sample(draws = 2000, tune = 100, chains = 4, random_seed = 42, progressbar = True, nuts_sampler = 'pymc',
                  discard_tuned_samples=True, compute_convergence_checks=True,return_inferencedata=True)

||mean|sd|hdi_3%|hdi_97%|mcse_mean|mcse_sd|ess_bulk|ess_tail|r_hat|
|---|---|---|---|---|---|---|---|---|---|
|mu_alpha|4.791|0.643|3.574|5.998|0.045|0.032|206.0|443.0|1.01|
|alpha_native_lang[0]|-0.752|0.313|-1.335|-0.154|0.019|0.013|270.0|523.0|1.03|
|alpha_native_lang[1]|-0.271|0.313|-0.864|0.318|0.019|0.013|270.0|519.0|1.03|
|alpha_native_lang[2]|0.120|0.313|-0.456|0.726|0.019|0.013|270.0|522.0|1.03|
|alpha_native_lang[3]|-0.198|0.313|-0.766|0.417|0.019|0.013|270.0|528.0|1.03|
|alpha_native_lang[4]|-0.985|0.313|-1.575|-0.391|0.019|0.013|270.0|526.0|1.03|
|alpha_native_lang[5]|1.320|0.313|0.734|1.916|0.019|0.014|270.0|526.0|1.03|
|alpha_native_lang[6]|1.682|0.313|1.100|2.283|0.019|0.014|270.0|528.0|1.03|
|alpha_native_lang[7]|0.298|0.313|-0.290|0.891|0.019|0.013|270.0|524.0|1.03|
|alpha_native_lang[8]|0.215|0.313|-0.373|0.810|0.019|0.013|270.0|526.0|1.03|
|alpha_native_lang[9]|-1.334|0.313|-1.930|-0.749|0.019|0.013|270.0|517.0|1.03|
|alpha_aspect[0]|1.430|0.527|0.509|2.509|0.038|0.027|198.0|451.0|1.01|
|alpha_aspect[1]|-2.101|0.527|-3.018|-1.021|0.038|0.027|198.0|453.0|1.01|
|alpha_aspect[2]|-1.960|0.527|-2.874|-0.881|0.038|0.027|198.0|439.0|1.01|
|alpha_aspect[3]|2.643|0.527|1.722|3.722|0.038|0.027|198.0|450.0|1.01|

fatihbozdag · December 11, 2023, 4:55pm

Well, it seems like the issue was due to overdispersion. The issue is now solved.

jessegrabowski · December 11, 2023, 5:16pm

Could you post the solution you landed on? I’m sure some future visitor would be interested!

fatihbozdag · December 11, 2023, 9:34pm

Sure thing! Since the reason is overdispersion, I’ve used NegativeBinomial rather than Poisson.

So I’ve added the prior for the alpha parameter,

# Prior for the dispersion parameter
    alpha = pm.Gamma('alpha', alpha=2, beta=0.1)

and then run pm.NegativeBinomial()

obs = pm.NegativeBinomial('obs', mu=np.exp(mu), alpha=alpha, observed=grouped_data['frequency'])

The final complete model is like this;

with pm.Model() as frequency_model:
    # Hyperprior for global mean
    mu_alpha = pm.Normal('mu_alpha', mu=0., sigma=10)
    
    # Random intercepts for native_language
    alpha_native_lang = pm.Normal('alpha_native_lang', mu=0, sigma=1, shape=len(native_language_mapping))

    # Random intercepts for aspect_combined
    alpha_aspect = pm.Normal('alpha_aspect', mu=0, sigma=1, shape=len(aspect_mapping))

    # Fixed effects for CEFR levels with custom priors
    beta_cefr_A1 = pm.Normal('beta_cefr_A1', mu=0, sigma=0.25)  # Updated strong prior for A1
    beta_cefr_A2 = pm.Normal('beta_cefr_A2', mu=0, sigma=0.25)  # Updated strong prior for A2
    beta_cefr_B1 = pm.Normal('beta_cefr_B1', mu=0, sigma=0.5)   # Updated moderate prior for B1
    beta_cefr_B2 = pm.Normal('beta_cefr_B2', mu=0, sigma=0.5)   # Updated moderate prior for B2
    beta_cefr_C1 = pm.Normal('beta_cefr_C1', mu=0, sigma=0.75)  # Updated weak to moderate prior for C1

    # Interaction terms for each tense-aspect structure and CEFR level
    for structure in structures:
        for cefr_pair in cefr_pairs:
            for level in cefr_pair:
                if level in ['A1', 'A2']:
                    if structure in ['Present_Simple', 'Present_Prog', 'Past_Simple']:
                        sigma_value = 0.25  # High confidence for basic structures
                    elif structure in ['Present_Perf', 'Past_Perf', 'Past_Perf_Prog']:
                        sigma_value = 1.5  # Low confidence for advanced structures
                    else:
                        sigma_value = 1.0  # Moderate confidence for other structures
                elif level in ['B1', 'B2']:
                    if structure in ['Present_Perf', 'Past_Prog', 'Past_Perf']:
                        sigma_value = 0.5  # Moderate confidence for intermediate structures
                    else:
                        sigma_value = 0.75  # Default for other structures at these levels
                else:  # For C1
                    sigma_value = 0.75  # Moderate confidence for all structures at advanced level
                
                locals()[f"beta_{level}_{structure}"] = pm.Normal(f'beta_{level}_{structure}', mu=0, sigma=sigma_value)

    # Expected value for the target variable (frequency of each structure)
    mu = mu_alpha + alpha_native_lang[grouped_data['native_language_idx'].values] + alpha_aspect[grouped_data['aspect_idx'].values]

    # Adding fixed effects for each CEFR level
    for level in ['A1', 'A2', 'B1', 'B2', 'C1']:
        mu += locals()[f"beta_cefr_{level}"] * (grouped_data['cefr_idx'].values == cefr_mapping[level])

    # Adding interaction terms
    for structure in structures:
        for level in ['A1', 'A2', 'B1', 'B2', 'C1']:
            mu += locals()[f"beta_{level}_{structure}"] * (grouped_data['cefr_idx'].values == cefr_mapping[level]) * (grouped_data['structure'] == structure)
            
    # Prior for the dispersion parameter
    alpha = pm.Gamma('alpha', alpha=2, beta=0.1)

    # Likelihood (NegativeBinomial distribution for count/frequency data)
    obs = pm.NegativeBinomial('obs', mu=np.exp(mu), alpha=alpha, observed=grouped_data['frequency'])

    
    # Sampling from the model (adjust the number of samples as needed)
    idata = pm.sample(draws = 2000, tune = 1000, chains = 4, random_seed = 42, progressbar = True, nuts_sampler = 'pymc',
                  discard_tuned_samples=True, compute_convergence_checks=True,return_inferencedata=True)

Topic		Replies	Views
Large R-Hat Statistics	3	79	May 30, 2025
Not divergences but high R hat v5 development	1	101	November 8, 2024
Bad r-hat on intercept only v5 modeling	7	1089	August 18, 2022
Very large r_hat of all estimated parameters	3	88	October 1, 2024
Very High Rhat when fitting Pymc-BART model v5 bart	7	1018	November 14, 2023

How does 'tune' parameter impact Rhat values?

Related topics