I have tasks that comprise of exactly two components but I have not observed the actual duration of every single one, but only the joined duration of the tasks. A task can be both, the first of two or the second of the two. I am interested in the distribution for every kind of task.
So, I want to model a set of chained equations pre_{ij} + post_{ik} = b_i where i \in 0, ..., N refers to the number of the equation and j,k \in 0, ..., M with M finite different types of tasks.
I assume that the durations are log-normally distributed.
I think the selection of a single RV per pre/post could be done with pm.Mixture
(in my case every mixture should be distributed log-normally aswell), however I struggle with the sum of the both.
Below you can find a simulation, that obviously does not work, since pm.Deterministic
has no observed
keyword. Afaik there is no closed form for the likelihood of the sum of log-normal distributions. Hence, I can’t use a DensityDist
/ Potential
instead.
mu_pre_true = [2, 20, 5]
mu_post_true = [9, 3, 5]
sigma_pre_true = [5, 7, 1]
sigma_post_true = [3, 5, 4]
n_samples = 500
# Only one weight for pre and post can be there at a time, probs sum to 1.
w_pre = pd.DataFrame(
stats.multinomial.rvs(n=len(mu_pre_true), p=[(1/len(mu_pre_true)), ] * len(mu_pre_true), size=n_samples)
)
w_post = pd.DataFrame(
stats.multinomial.rvs(n=len(mu_post_true), p=[(1/len(mu_post_true)), ] * len(mu_post_true), size=n_samples)
)
# simulate the true observed values
pre_true = pm.LogNormal.dist(mu=mu_pre_true, sigma=sigma_pre_true)
pre_draws = pm.draw(pre_true, draws=w_pre.shape[0])
post_true = pm.LogNormal.dist(mu=mu_post_true, sigma=sigma_post_true)
post_draws = pm.draw(post_true, draws=w_post.shape[0])
samples = (pre_draws * w_pre).sum(axis=1) + (post_draws * w_post).sum(axis=1)
with Model(
coords={
"type": w_pre.columns,
"obs": w_post.index,
}
) as base_model:
# Priors
pre_dist_mu = pm.HalfNormal(name="pre_mu", mu=5, sigma=1, dims="type")
post_dist_mu = pm.HalfNormal(name="post_mu", mu=3, sigma=1, dims="type")
pre_dist_sigma = pm.HalfCauchy(name="pre_mu", mu=5, sigma=1, dims="type")
post_dist_sigma = pm.HalfCauchy(name="post_mu", mu=3, sigma=1, dims="type")
# Parameters of interest
pre_dist = pm.LogNormal(name="LogNormal", mu=pre_dist_mu, sigma=pre_dist_sigma)
post_dist = pm.LogNormal(name="LogNormal", mu=post_dist_mu, sigma=post_dist_sigma)
# weights
pre_wt = pm.MutableData('wt', w_pre, dims=['obs', 'type'])
post_wt = pm.MutableData('wt', w_post, dims=['obs', 'type'])
pre_mix = pm.Mixture('pre_mix', components=pre_dist, wt=pre_wt)
post_mix = pm.Mixture('post_mix', components=post_dist, wt=post_wt)
# Likelihood
# obs = pm.Mixture("Likelihood", components=[pre_mix, post_mix], observed=samples)
obs = pm.Deterministic("Like", pre_mix + post_mix, observed=samples)
trace = pm.sample(2000,
tune=2000,
chains=4)
az.summary(trace)
Has someone a suggetion on how to handle this sum in PYMC?
Thank in advance!