Is my model correct?

Zaid_Abubaker · September 29, 2024, 7:40am

Hello!

I’ve been making a Bayesian inference model lately to infer a chronological order of a literary text based on its features (the latent variable ‘time’).
I don’t have much knowledge in Bayesian statistics, but I managed to make this model based on videos and articles I’ve read:

sura_order = ['sura_32', 'sura_45', 'sura_30', 'sura_12', 'sura_35', 'sura_13']

sura_labels = ['sura_6', 'sura_7', 'sura_10', 'sura_11', 'sura_12', 'sura_13', 'sura_14',
               'sura_16', 'sura_17', 'sura_18', 'sura_28', 'sura_29', 'sura_30', 'sura_31',
               'sura_32', 'sura_34', 'sura_35', 'sura_39', 'sura_40', 'sura_41', 'sura_42',
               'sura_45', 'sura_46']
sura_indices = [sura_labels.index(sura) for sura in sura_order]

# Priors for the texts
prior_mu = np.zeros(len(sura_labels))
prior_sigma = np.ones(len(sura_labels)) * 0.2

with pm.Model() as model:

    time = pm.Normal('time', mu=prior_mu, >sigma=prior_sigma, shape=len(sura_labels))


    MVL_obs = pm.Normal('MVL_obs', >mu=time, sigma=0.025, >observed=data['MVL'])

    Sura_Length_obs = >pm.Normal('Sura_Length_obs', mu=time, >sigma=0.15, observed=data['Sura_Length'])

    Structural_Complexity_obs = >pm.Normal('Structural_Complexity_obs', >mu=time, sigma=0.15, >observed=data['Structural_Complexity'])

    SD_obs = pm.Normal('SD_obs', mu=time, >sigma=0.05, observed=data['SD'])


    # Sampling
    trace = pm.sample(1000, tune=1000, >target_accept=0.9)

My question is:
Is my model correct? Is setting the latent variable ‘time’ as the mean of the observable variable the best method?

Thanks.

ckrapu · September 30, 2024, 7:16am

We’re going to need quite a bit more information to give you any advice here. Can you give us a fully reproducible example, including some data?

What is your data variable?

Zaid_Abubaker · October 2, 2024, 4:25pm

Hi, sure!

data = pd.DataFrame({
    'Sura_Length': [167, 195, 109, 123, 111, 44, 52, 106, 110, 105, 88, 69, 60, 31, 30, 54, 45, 72, 84, 53, 50, 36, 34],
    'MVL': [116.46, 104.26, 104.36, 96.98, 99.41, 123.27, 99.43, 93.25, 90.98, 95.36, 101.33, 92.35, 87.18, 100.27, 77.27,
            99.31, 108.98, 102.53, 90.25, 95.32, 105.56, 86.33, 116.06],
    'Structural_Complexity': [31, 38, 17, 27, 15, 25, 17, 24, 22, 22, 19, 25, 23, 19, 18, 36, 23, 26, 30, 15, 23, 9, 15],
    'SD': [56.42, 58.8, 49.36, 40.57, 47.69, 56.13, 53.15, 37.87, 31.89, 49.62, 36.72, 34.91, 40.37, 42.68, 28.59, 43.41,
           52.77, 51.48, 42.87, 40.81, 44.68, 29.01, 50.46]
})

Here’s my data.

Zaid_Abubaker · October 2, 2024, 4:33pm

Here’s the entire model if needed as well:

import pymc as pm
import numpy as np
import pandas as pd
import arviz as az
import matplotlib.pyplot as plt

# Data of the text
data = pd.DataFrame({

    'Sura_Length': [167, 195, 109, 123, 111, 44, 52, 106, 110, 105, 88, 69, 60, 31, 30, 54, 45, 72, 84, 53, 50, 36, 34],

    'MVL': [116.46, 104.26, 104.36, 96.98, 99.41, 123.27, 99.43, 93.25, 90.98, 95.36, 101.33, 92.35, 87.18, 100.27, 77.27,
            99.31, 108.98, 102.53, 90.25, 95.32, 105.56, 86.33, 116.06],

    'Structural_Complexity': [31, 38, 17, 27, 15, 25, 17, 24, 22, 22, 19, 25, 23, 19, 18, 36, 23, 26, 30, 15, 23, 9, 15],

    'SD': [56.42, 58.8, 49.36, 40.57, 47.69, 56.13, 53.15, 37.87, 31.89, 49.62, 36.72, 34.91, 40.37, 42.68, 28.59, 43.41,
           52.77, 51.48, 42.87, 40.81, 44.68, 29.01, 50.46]
})

# Known order (not consecutive)

sura_order = ['sura_32', 'sura_45', 'sura_30', 'sura_12', 'sura_35', 'sura_13']

sura_labels = ['sura_6', 'sura_7', 'sura_10', 'sura_11', 'sura_12', 'sura_13', 'sura_14',
               'sura_16', 'sura_17', 'sura_18', 'sura_28', 'sura_29', 'sura_30', 'sura_31',
               'sura_32', 'sura_34', 'sura_35', 'sura_39', 'sura_40', 'sura_41', 'sura_42',
               'sura_45', 'sura_46']
sura_indices = [sura_labels.index(sura) for sura in sura_order]

# Priors
prior_mu = np.zeros(len(sura_labels))
prior_sigma = np.ones(len(sura_labels)) * 0.2

with pm.Model() as model:
    # Using normal priors for time
    time = pm.Normal('time', mu=prior_mu, sigma=prior_sigma, shape=len(sura_labels))

    MVL_obs = pm.Normal('MVL_obs', mu=time, sigma=0.025, observed=data['MVL'])
    Sura_Length_obs = pm.Normal('Sura_Length_obs', mu=time, sigma=0.15, observed=data['Sura_Length'])
    Structural_Complexity_obs = pm.Normal('Structural_Complexity_obs', mu=time, sigma=0.15, observed=data['Structural_Complexity'])
    SD_obs = pm.Normal('SD_obs', mu=time, sigma=0.05, observed=data['SD'])

    trace = pm.sample(1000, tune=1000, target_accept=0.9)

summary = az.summary(trace)
print(summary)

Topic		Replies	Views
Question regarding using pm.Dirichlet v5 modeling	1	204	March 17, 2023
Help with computing Bayes Factors version agnostic modeling	13	605	July 19, 2022
Ideas for reparameterizing models/changing priors to avoid divergences Questions	3	873	December 5, 2019
Theoretical and Practical Considerations and Questions v5 development , modeling , sampling	0	18	September 13, 2024
Choosing an appropriate model for reaction times Questions	13	985	June 7, 2021

Is my model correct?

Related topics