 # How best to model risk ratio

Hi, I’m cross-posting from stackoverflow since I just realised I might be better off asking here. https://stackoverflow.com/questions/56928544/how-can-i-calculate-risk-ratio-using-pymc3

I’d like to model the risk ratio for data in a survey, but I’m having difficulty framing the problem to pymc3. Risk ratio is the ratio of probability of some outcome given an exposure, to the probability without exposure.

So as I understand it, risk ratio is: (incidence in exposed group) / (incidence in notExposed group)

I’ve tried posing the problem as shown in the sample code below:

``````import pymc3

exposed_num_observations = 25
exposed_observed = 11
notExposed_num_observations = 853
notExposed_observed = 202

with pm.Model() as rr_mdoel:
# Prior
p_exposed = pm.Uniform("p_exposed", 0 ,1)
p_notExposed = pm.Uniform("p_notExposed", 0 ,1)
sd = pm.Uniform("sd", 0,5)

# Likelihood
exposed = pm.Binomial("exposed",
p=p_exposed,
n=exposed_num_observations,
observed=exposed_observed)
notExposed = pm.Binomial("notExposed",
p=p_notExposed,
n=notExposed_num_observations,
observed=notExposed_observed)

risk_ratio = pm.Normal("risk_ratio", mu=(p_exposed/p_notExposed), sd=sd)

# Inference Run/ Markov chain Monte Carlo
trace = pm.sample(10000, chains=2, tune=1000)
``````

I expected the `risk_ratio` to always be above 0 (because you cant have a negative risk ratio), and I expected `sd` to be fit to some value. The results (as shown by `az.plot_posterior(trace, round_to=3)` for example) show some negative values for `risk_ratio` and `sd` is still uniform.

Any tips here are greatly appreciated!

The risk ratio is a strictly deterministic function of the probabilities. This means it should be defined as

``````risk_ratio = pm.Deterministic('risk_ratio', p_exposed/p_notExposed)

``````

That way when you sample from the joint posterior of `"exposed"` and `"notExposed"`, they are converted into the (univariate) distribution of `"risk_ratio"`. If you want to estimate the mean and standard deviation, you can pull it out of

``````np.std(trace['risk_ratio'])
``````

By contrast, what this line does:

``````risk_ratio = pm.Normal("risk_ratio", mu=(p_exposed/p_notExposed), sd=sd)
``````

is define a new Normal random variable with mean `p_exposed/p_notExposed` and deviation `sd`. Because there aren’t any observations of this random variable, there is no information to constrain `sd`, which is why the posterior of `sd` exactly matches the prior.

Fantastic - that seems to work as expected! Thanks for an informative response

edit: feel free to leave that answer on stackoverflow too if you use it