Dear all,
I am trying to setup a Bayesian metanalysis of report rates of events. My target estimate is the probability that participants will report X in each study. Every participant provides ~20-40 response, where they can say X or non-X. Therefore, each study provides a mean rate X across its subjects. I am trying to setup a Bayesian model to estimate the mean rate.
So far I have reasoned the following.
- The distributions I want to work with have to be somehow constrained between 0-1, reflecting the probability to report X.
- It is better to work with logs instead of raw probabilities, given that my rates are typically close to 0 - .1.
- I will approximate the log standard error as SE / p (delta method).
Here are some simulated data and the model.
import numpy as np
import pandas as pd
import pymc as pm
import arviz as az
np.random.seed(123)
study_means = np.exp(np.random.normal(np.log(.1),.1,20))
study_se = np.random.uniform(0.02, 0.05, 20)
df = pd.DataFrame({
"mean_rate":study_means,
"se":study_se,
})
df["log_rate"] = np.log(df["mean_rate"])
df["log_se"] = df["se"] / df["mean_rate"]
# Naive random-effects model
coords = {"random_effects": df.index.values}
with pm.Model(coords=coords) as base_model:
# Set model data
log_rate = pm.Data("log_rate", df["log_rate"].values, dims="random_effects")
log_se = pm.Data("log_se", df["log_se"].values, dims="random_effects")
# Priors
theta = pm.TruncatedNormal("theta", mu=-.7, sigma=1, upper=0)
tau = pm.HalfCauchy("tau", beta=.5)
# Transformations
theta_k = pm.TruncatedNormal("theta_k", mu=theta, sigma=tau, upper=0, dims="random_effects")
# Likelihood
obs = pm.TruncatedNormal("obs", mu = theta_k, sigma = log_se, upper=0, observed = log_rate, dims="random_effects")
# Sample prior+posterior
prior_ppc = pm.sample_prior_predictive(500)
idata = pm.sample(5000,tune=5000,target_accept=0.999)
posterior_ppc = pm.sample_posterior_predictive(trace=idata,var_names=["theta","tau"])
I have solved the range issue by forcing the normal distribution to have an upper bound of 0, which translates a probability of 1. I have centered the prior around -.5, which roughly centered the distribution at p=.5. The HalfCauchy(0,.5) is standard in modelling tau.
Depending on the simulated dataset (varying mean and sigma of study means), I have a few divergences.
Is there a way to reparametrize this model, or have a different model specification to make it easier to sample from the posterior? Increasing target_accept
has lowered the number of divergences so far.