@iavicenna i have doubt can you please verify this below once
I want to do prior_predictive_check and check does my prior are good or not below is my model code and with the last line for prior_predictive_check
As the Number of rows in my data set is 398 so I took 150 samples
import pymc as pm
import numpy as np
# Preparing the data
cleaned_data['LocationDesc_encoded'] = cleaned_data['LocationDesc'].astype('category').cat.codes
# Define the model
with pm.Model() as hierarchical_model:
# Hyperpriors for group nodes
mu_a = pm.Normal('mu_a', 0.0, 0.5)
sigma_a = pm.Exponential('sigma_a', 1.0)
# Priors for individual intercepts
a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=len(cleaned_data['LocationDesc_encoded'].unique()))
# Hyperpriors for group slopes
mu_b = pm.Normal('mu_b', 0.0, 0.5)
sigma_b = pm.Exponential('sigma_b', 1.0)
# Priors for individual slopes
b = pm.Normal('b', mu=mu_b, sigma=sigma_b, shape=(len(cleaned_data['LocationDesc_encoded'].unique()), 4))
# Model error
sigma = pm.Exponential('sigma', 1.0)
# Expected value
mu = a[cleaned_data['LocationDesc_encoded'].values] + \
b[cleaned_data['LocationDesc_encoded'].values, 0] * cleaned_data['obesity_Prevalence'] + \
b[cleaned_data['LocationDesc_encoded'].values, 1] * cleaned_data['data_value_py'] + \
b[cleaned_data['LocationDesc_encoded'].values, 2] * cleaned_data['Avg Temp(°F)'] + \
b[cleaned_data['LocationDesc_encoded'].values, 3] * cleaned_data['AQI']
# Likelihood
Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=cleaned_data['Data_Value'])
idata = pm.sample_prior_predictive(samples=150, random_seed=42)
From the below reference, I read how to do a prior_predictice check and wrote the below code.
https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/posterior_predictive.html
import numpy as np
import xarray as xr
# Example dimensions
num_locations = 49
num_samples_per_location = 150
# Generate random values for each location within the specified range
x = xr.DataArray(
np.random.uniform(low=-2, high=2, size=(num_locations, num_samples_per_location)),
dims=['location', 'sample']
)
# Print shapes to verify
print("Shape of prior_a:", prior_a.shape) # Should be (1, 49, 150)
print("Shape of prior_b:", prior_b.shape) # Should be (1, 49, 150, 4)
print("Shape of x:", x.shape) # Should be (49, 150)
I wrote a model for each Location so here below calculating y for each location as the target is dependent on 4 predictors, So I used 4 predictors so here also calculating for all 4 predictors by running loop (k) from 0 to 3.
# Initialize y with zeros
y = np.zeros((num_locations, num_samples_per_location))
# Iterate over each location and sample
for location in range(num_locations):
for k in range(4):
# Compute y for the current location and sample
y[location, :] = prior_a[0, :, location] + \
np.sum(prior_b[0, :, location, :][k] * x[location, :])
# y now contains the computed values for each location and sample
print("Shape of y:", y.shape) #(49,150)
It’s always hard to tell on paper – the best is to plot their implication on the outcome scale, like that:
# Plotting the graph
plt.figure(figsize=(10, 6))
num_locations = 4
for location in range(num_locations):
plt.scatter(x[location, :],y[location,:], label=f"Location {location}" )
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Y vs X for different locations')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize='small', ncol=2)
plt.show()
I uploaded the results I am getting from the above code and did only for 4 locations now, the questions I have are mentioned below.
Q-1) all the above things I did was correct or not if wrong how can I correct them, if possible can you provide code snippets
Q-2) if correct than what can I interpret from the above graphs or result
