Likelihood of non-normally distributed RVs

laqua-stack · April 14, 2023, 4:15pm

I have tasks that comprise of exactly two components but I have not observed the actual duration of every single one, but only the joined duration of the tasks. A task can be both, the first of two or the second of the two. I am interested in the distribution for every kind of task.
So, I want to model a set of chained equations pre_{ij} + post_{ik} = b_i where i \in 0, ..., N refers to the number of the equation and j,k \in 0, ..., M with M finite different types of tasks.
I assume that the durations are log-normally distributed.

I think the selection of a single RV per pre/post could be done with pm.Mixture (in my case every mixture should be distributed log-normally aswell), however I struggle with the sum of the both.
Below you can find a simulation, that obviously does not work, since pm.Deterministic has no observed keyword. Afaik there is no closed form for the likelihood of the sum of log-normal distributions. Hence, I can’t use a DensityDist / Potential instead.

mu_pre_true = [2, 20, 5]
mu_post_true = [9, 3, 5]

sigma_pre_true = [5, 7, 1]
sigma_post_true = [3, 5, 4]

n_samples = 500

# Only one weight for pre and post can be there at a time, probs sum to 1.
w_pre = pd.DataFrame(
    stats.multinomial.rvs(n=len(mu_pre_true), p=[(1/len(mu_pre_true)), ] * len(mu_pre_true), size=n_samples)
)
w_post = pd.DataFrame(
    stats.multinomial.rvs(n=len(mu_post_true), p=[(1/len(mu_post_true)), ] * len(mu_post_true), size=n_samples)
)

# simulate the true observed values
pre_true = pm.LogNormal.dist(mu=mu_pre_true, sigma=sigma_pre_true)
pre_draws = pm.draw(pre_true, draws=w_pre.shape[0])

post_true = pm.LogNormal.dist(mu=mu_post_true, sigma=sigma_post_true)
post_draws = pm.draw(post_true, draws=w_post.shape[0])

samples = (pre_draws * w_pre).sum(axis=1) + (post_draws * w_post).sum(axis=1)

with Model(
        coords={
            "type": w_pre.columns,
            "obs": w_post.index,
        }
) as base_model:

    # Priors
    pre_dist_mu = pm.HalfNormal(name="pre_mu", mu=5, sigma=1, dims="type")
    post_dist_mu = pm.HalfNormal(name="post_mu", mu=3, sigma=1, dims="type")
    pre_dist_sigma = pm.HalfCauchy(name="pre_mu", mu=5, sigma=1, dims="type")
    post_dist_sigma = pm.HalfCauchy(name="post_mu", mu=3, sigma=1, dims="type")

    # Parameters of interest
    pre_dist = pm.LogNormal(name="LogNormal", mu=pre_dist_mu, sigma=pre_dist_sigma)
    post_dist = pm.LogNormal(name="LogNormal", mu=post_dist_mu, sigma=post_dist_sigma)

    # weights
    pre_wt = pm.MutableData('wt', w_pre, dims=['obs', 'type'])
    post_wt = pm.MutableData('wt', w_post, dims=['obs', 'type'])

    pre_mix = pm.Mixture('pre_mix', components=pre_dist, wt=pre_wt)
    post_mix = pm.Mixture('post_mix', components=post_dist, wt=post_wt)

    # Likelihood
    # obs = pm.Mixture("Likelihood", components=[pre_mix, post_mix], observed=samples)
    obs = pm.Deterministic("Like", pre_mix + post_mix, observed=samples)

    trace = pm.sample(2000,
                      tune=2000,
                      chains=4)

az.summary(trace)

Has someone a suggetion on how to handle this sum in PYMC?

Thank in advance!

ricardoV94 · April 14, 2023, 5:06pm

In general, the sum of two variables requires a convolution integral. You can always use something like scipy to integrate numerically if no closed form solution exists.

You would need to wrap in in an Op and figure out the gradients as well.

Topic		Replies	Views
Weighted Sum of RVs v5 modeling	7	579	March 28, 2023
Custom likelihood for MvNormal distribution (how to implement logp and random method?) v5 modeling	5	332	April 25, 2023
How to model the problem of sum of normally distributed values version agnostic	1	587	July 17, 2022
How to use a Deterministic in a mixture? Questions	8	2029	June 12, 2018
How to model product of multiple random variables Questions	8	4201	June 28, 2019

Likelihood of non-normally distributed RVs

Related topics