How to model the problem of sum of normally distributed values

dmirecki · July 17, 2022, 6:18am

Hi!

Imagine that I have a process that generates a value every day of the week from one of normal distributions (with means {0, 1, 2} respectively and sd=1). I only observe the sum of the values per week. Each of the value is normally distributed, so the observed sum will be normally distributed (according to Sum of normally distributed random variables - Wikipedia).

I simulated the data and coded the model, but despite the large number of draws, the model still does not converge.

Model:

d = np.random.normal(loc=(np.random.multinomial(3, pvals=np.random.dirichlet([4,1,2]), size=(100,)) * np.array([0,1,2])), scale=1).sum(axis=1)

with pm.Model() as test_model:
    activities_allocation = pm.Dirichlet("activities_allocation", a=np.array([4, 1, 2]), shape=3)
    m = pm.DirichletMultinomial("count", n=3, a=activities_allocation, shape=(1, 3))

    mu_first = pm.Normal("mu_first", mu=0, sd=1)
    mu_second = pm.Normal("mu_second", mu=1, sd=1)
    mu_third = pm.Normal("mu_third", mu=2, sd=1)

    pm.Normal(
        'sum',
        mu=m[:, 0] * mu_first + m[:, 1] * mu_second + m[:, 2] * mu_third,
        sigma=np.sqrt(3),
        observed=d,
    )

with test_model:
    idata = pm.sample(20000, tune=40000, return_inferencedata=True)

pyMC output:

Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>NUTS: [mu_third, mu_second, mu_first, activities_allocation]
>Metropolis: [count]

Sampling 4 chains for 40_000 tune and 20_000 draw iterations (160_000 + 80_000 draws total) took 204 seconds.
There were 3802 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6462577439757295, but should be close to 0.8. Try to increase the number of tuning steps.
There were 11388 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.1561264635253388, but should be close to 0.8. Try to increase the number of tuning steps.
There were 1236 divergences after tuning. Increase `target_accept` or reparameterize.
There were 9053 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.35609104283536835, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

I would be grateful for any help and insights - how to correctly model the problem I described in order to obtain the convergence of the model and larger effective samples?

ricardoV94 · July 17, 2022, 8:10am

I would try to parametrize the model in terms of mixture weights * total sum * peak locations (e.g. with a Dirichlet * Exponential * vector of normals), instead of modelling individual counts * peak locations like you did, and see what happens.

Also you might want to introduce an ordering constraint across your normals, if there is multimodality in the posterior

Topic		Replies	Views
Modelling Multivariate Normal RV, with each row in data is sum of these RVs at different number of times Questions	0	358	October 6, 2020
Mu of pm.Normal() depends on output of RandomVariable on previous time step. How do I implement this? v5 time_series , modeling	8	713	July 26, 2023
How to optimize a model to joint multiple distributions Questions	0	419	October 24, 2019
Prior Predictive Simulation from a multivariate normal distribution Questions	7	808	December 29, 2020
Mean of likelihood function for Bayesian regression Questions modeling	6	500	January 20, 2022

How to model the problem of sum of normally distributed values

Related topics