Good morning everyone, I have just started studying Bayesian statistics because I was asked to develop a Bayesian network in Python. I have concentration data for a chemical compound that undergoes a treatment. I would like to use Bayes’ theorem to calculate the posterior probability of the concentration (C) of my compound after the treatment. The concentration data are very noisy. Looking through blogs, I read that many people start by estimating the probability distribution of the variables. I wrote this code to estimate the posterior distribution of C:
data_obs = [
8.07, 7.5, 7.68, 7.9, 7.97, 8.61, 7.43, 7.8, 7.84, 6.61, 7.3, 9.5,
6.46, 7.72, 8.77, 8.47, 8.78, 8.52, 9.82, 10.6, 11.1, 6.7971, 6.688,
6.2711, 7.2524, 9.2091, 6.6683, 9.8166, 19.026, 13.3115, 10.4902, 12.8995,
13.8184, 9.4221, 8.6046, 4.0691, 0.9683, 1.2485, 1.391, 0.25, 0.25, 0.25,
13.4918, 18.4025, 16.5562, 21.5476, 12.4482, 13.8427, 17.9769, 11.6711
] #pretended measurments
max_obs = np.max(data_obs)
mu = max_obs/2
std = max_obs/4
GSD = 8
sigma = 21
with pm.Model() as model:
GSD = pm.LogNormal(‘GSD’, mu=1, sigma=log_GSD_sigma) # geometric standard deviation
GM = pm.Normal(‘GM’, mu=mu, sigma=std) #geometric mean
mu_log = pm.Deterministic('mu_log', pm.math.log(GM))
sigma_log = pm.Deterministic('sigma_log', pm.math.log(GSD))
# Estimated parameters
likelihood = pm.LogNormal('likelihood', mu=mu_log, sigma=sigma_log, observed=data_obs)
# Predicted parameters
y_pred = pm.LogNormal('y_pred', mu=mu_log, sigma=sigma_log)
# Sampling
trace = pm.sample(draws=20000, tune=1000, cores=1, step=pm.NUTS())
with model:
# draw 10000 posterior samples
idata = pm.sample(20000)
az.summary(idata, round_to=2)
I obtain two posterior distributions for GM (the geometric mean) and for GSD. Now, however, I don’t know how to proceed to calculate the posterior probability for C. That is, I now have the parameters that describe the probability distribution of C, but if my compound undergoes a treatment, the probability distribution of C will be conditioned by the distribution of the treatment. How can I proceed? Should I try to fit the data with the mean values of GSD and GM and use this distribution as the prior distribution for the first node? Additionally, the treatment will also have its own distribution parameters. I apologize for the length of the message and also in advance if the question is stupid, but I repeat that I am very new to this area of statistics and I have been stuck for a long time. Thank you very much for any help. Morover, is it possibile to discretize the as obtained distribution?