How best to model risk ratio

lewiso1 · July 8, 2019, 5:12am

Hi, I’m cross-posting from stackoverflow since I just realised I might be better off asking here. https://stackoverflow.com/questions/56928544/how-can-i-calculate-risk-ratio-using-pymc3

I’d like to model the risk ratio for data in a survey, but I’m having difficulty framing the problem to pymc3. Risk ratio is the ratio of probability of some outcome given an exposure, to the probability without exposure.

So as I understand it, risk ratio is: (incidence in exposed group) / (incidence in notExposed group)

I’ve tried posing the problem as shown in the sample code below:

import pymc3

exposed_num_observations = 25
exposed_observed = 11
notExposed_num_observations = 853
notExposed_observed = 202

with pm.Model() as rr_mdoel:
    # Prior
    p_exposed = pm.Uniform("p_exposed", 0 ,1)
    p_notExposed = pm.Uniform("p_notExposed", 0 ,1)
    sd = pm.Uniform("sd", 0,5)

    # Likelihood
    exposed = pm.Binomial("exposed", 
                      p=p_exposed, 
                      n=exposed_num_observations,  
                      observed=exposed_observed)
    notExposed = pm.Binomial("notExposed", 
                             p=p_notExposed, 
                             n=notExposed_num_observations,  
                             observed=notExposed_observed)

    risk_ratio = pm.Normal("risk_ratio", mu=(p_exposed/p_notExposed), sd=sd)

    # Inference Run/ Markov chain Monte Carlo 
    trace = pm.sample(10000, chains=2, tune=1000)

I expected the risk_ratio to always be above 0 (because you cant have a negative risk ratio), and I expected sd to be fit to some value. The results (as shown by az.plot_posterior(trace, round_to=3) for example) show some negative values for risk_ratio and sd is still uniform.

Any tips here are greatly appreciated!

chartl · July 8, 2019, 6:50am

The risk ratio is a strictly deterministic function of the probabilities. This means it should be defined as

risk_ratio = pm.Deterministic('risk_ratio', p_exposed/p_notExposed)

That way when you sample from the joint posterior of "exposed" and "notExposed", they are converted into the (univariate) distribution of "risk_ratio". If you want to estimate the mean and standard deviation, you can pull it out of

np.std(trace['risk_ratio'])

By contrast, what this line does:

risk_ratio = pm.Normal("risk_ratio", mu=(p_exposed/p_notExposed), sd=sd)

is define a new Normal random variable with mean p_exposed/p_notExposed and deviation sd. Because there aren’t any observations of this random variable, there is no information to constrain sd, which is why the posterior of sd exactly matches the prior.

lewiso1 · July 8, 2019, 8:47pm

Fantastic - that seems to work as expected! Thanks for an informative response

edit: feel free to leave that answer on stackoverflow too if you use it

Topic		Replies	Views
Negative binomial model with exposure version agnostic gaussian_process , modeling	2	290	February 13, 2024
Advice on Modeling Censored Data from a Ratio of Variables version agnostic	1	20	March 13, 2025
How to model observed percentages (bounded from 0 to 1) Questions	8	2734	January 3, 2018
Problem with Deterministic variable Questions	7	1022	March 22, 2021
Bayesian model with constraints Questions	1	471	September 16, 2020

How best to model risk ratio

Related topics