Ok, I think I understand. Feel free to correct me if not. I assume the synthetic data and model you provided are for a single country. And then you run that same model twice, each time using data from a different country?
Below I have modified your example to merge what I believe to be your 2 models (1 model per country) that each include 4 parameters (an intercept, 2 slopes, and an error term) into a single model with 4 pairs of parameters. As I mentioned, including two b_A
parameters (1 per country) in a single trace will allow you to straightforwardly compare the posterior of each parameter. I have also modified the synthetic data, both to double the number of rows, and to add in a country
column that dummy codes which country the data is associated with. This variable is then used to select the appropriate half of each parameter pair from the model.
So let me know if I misunderstood anything or if you have any questions.
RSEED = 12354
import pandas as pd
import numpy as np
from numpy import random
import pymc3 as pm
# Sample data
# country 1
A = random.randint(0, 100, 500) * 0.01
B = random.randint(1, 6, 500)
Z = random.randint(1, 1000, 500) * 0.001
country = 0 * np.ones(500, dtype=int)
df1 = pd.DataFrame({"A": A, "B": B, "Z": Z, "country": country})
# country 2
A = random.randint(0, 100, 500) * 0.01
B = random.randint(1, 6, 500)
Z = random.randint(1, 1000, 500) * 0.001
country = 1 * np.ones(500, dtype=int)
df2 = pd.DataFrame({"A": A, "B": B, "Z": Z, "country": country})
df = pd.concat([df1, df2], ignore_index=True)
with pm.Model() as model1:
# priors
intercept = pm.Normal("intercept", mu=0, sd=10, shape=2)
b_A = pm.Normal("b_A", mu=0, sd=5, shape=2)
b_B = pm.Normal("b_B", mu=0, sd=5, shape=2)
# Model error
model_err = pm.HalfNormal("model_err", sd=0.5, shape=2)
# linear combination of predictors/parameters
linear = (
intercept[df["country"]]
+ b_A[df["country"]] * df["A"]
+ b_B[df["country"]] * df["B"]
)
# Likelyhood
y = pm.Beta(
"y", mu=pm.invlogit(linear), sd=model_err[df["country"]], observed=df["Z"]
)
# Inference
trace1 = pm.sample(2000, cores=1, init="adapt_diag")
# calculate p that b_A for country 1
# is greater than b_A for country 2
print(np.mean(trace1["b_A"][:, 0] > trace1["b_A"][:, 1]))