Say I am trying to model blood alcohol concentration (BAC) in individuals after consuming alcohol. For each individual I have data describing:
n_drinks: number of drinks consumed
sex_is_female_01: 1 when F, 0 when M
bac: BAC as measured x minutes after consumed alcohol (x is fixed for all participants)
I make the simple model below to predict BAC from
with pm.Model() as model: mu_intercept = Normal('mu_intercept', mu = 0.0, tau = 0.0001) sigma_intercept = HalfCauchy('sigma_intercept', 5) a = Normal('a', mu = mu_intercept, sigma = sigma_intercept) b = Normal('b', mu = 0.0, sigma = 1e5, shape = 2) sd_y = HalfCauchy('sd_y', 5) y_hat = a + b*n_drinks+ b*sex_is_female_01 y_like = Normal('y_like', mu = y_hat, sigma = sd_y, observed = bac)
Fortunately this works quite well and I can show BAC independently depends on
n_drinks. However, I also know that males tend to be heavier than females, and I start to wonder if
sex_is_female_01 may actually be a proxy for weight. Sadly, I do not have weight data for my participants. . . I can however find data online describing the distribution of male and female weights in the population from which my participants were drawn (https://jech.bmj.com/content/40/4/319 – yes I realize people have gotten fatter since, but let’s say these are still accurate. Alternatively say it is 19, 19, 1985 (youtube), the year the data in the referenced paper was collected). I code up another model including that distribution as a predictor.
Edit: I am realized since posting that I was attempting to implement a bayesian approach to simply creating an additional independent variable and filling it with the average female weight for female subjects and average male weight for male subjects. Thus, the created variable containing mean weights would be completely dependent on sex, and should not be included in the model (obviously correct me if I am wrong). That being said, I would still like to know if the model I implemented below does what I intended. Thank you.
with pm.Model() as vary_intercept: mu_intercept = Normal('mu_intercept', mu = 0.0, tau = 0.0001) sigma_intercept = HalfCauchy('sigma_intercept', 5) #female weight distribution from referenced paper weight_f = Normal('weight_f', mu = 61, sigma = 11) #male weight distribution from referenced paper weight_m = Normal('weight_m', mu = 76, sigma = 11.5) #distribution of weight for a given subject, depending on sex weight = weight_m*(1 - subject_is_female_01) + weight_f*(subject_is_female_01) a = Normal('a', mu = mu_intercept, sigma = sigma_intercept) b = Normal('b', mu = 0.0, sigma = 1e5, shape = 3) sd_y = HalfCauchy('sd_y', 5) y_hat = a + b*n_drinks+ b*subject_is_female_01 + b * weight y_like = Normal('y_like', mu = y_hat, sigma = sd_y, observed = bac)
I was worried that the model would not run, as there would seem to a dimensional mismatch in
y_hat – we only have one sex per participant, but a whole distribution of possible weights for each. But model sampled as expected.
My questions are:
- What have I misunderstood about the shape of the terms in
y_hatthat led me to believe this would not work?
- Does the model represent what I intended?