Modelling multiple correlated variables

fredzett · December 17, 2021, 11:21am

I am completley new to pymc3 and really like the api and the flexibility that comes along.

However, I am stuck at modelling something a cash flow model. For sake of simplicity let’s say I want to model the following:

y = \sum_{t=0}^T \dfrac{CF_t}{(1+r)^t}

where all CF and r are normally distributed and CFs are correlated.

How would I best model this assuming the following priors

all CFs follow a normal distribution with different mus and stds
all CFs are correlated
r follows a normal distribution

My main challenges are:

how do I best create T parameters with different mus and std. I know that I can pass a shape parameter, but I cannot pass different mus and stds? Is the only way to create individual parameters?
how do I model that y is the sum of different parameters?
how do I model the correlation part? (this is likely a prob programming question rather than a pymc3 question

My below attempt works but has two pitfalls:

if T = 100 I had to model 100 separate variables
it does not account for correlations between cf1, cf2 and cf3

T = 3
model = pm.Model()
with model:
    # Define priors
    cf1 = pm.Normal("cf1",100,10)
    cf2 = pm.Normal("cf2",100,15)
    cf3 = pm.Normal("cf3",100,20)

    i = pm.Normal("i", 0.1,0.04)
    t = range(0,T)
    
    # Model output and treat as deterministic to keep track
    y = pm.Deterministic("y",cf1/(1+i)**t[0] + cf2/(1+i)**t[1] + cf3/(1+i)**t[2])
    
    trace = pm.sample()

Can anyone point me to the right direction?

Thanks for your help!

(Note that this is a very simplified version of what I really want to achieve)

mjedrz · December 17, 2021, 12:48pm

I think I can answer 3: if you want to include correlations in your prior, then you need to create a covariance matrix, which will contain the information about correlations between variables (covariance is a measure of how the variables vary together). Then you need to load that covariance matrix into a multivariate distribution, which in your case will be:
cfs = pm.MvNormal("cfs", mu=[100, 100, 100], cov=your_covariance_matrix, shape=3)

fredzett · December 17, 2021, 1:03pm

Great thanks for the swift help. I tried this multiple times and it didn’t work given I forgot to set the shape parameter…THANKS!

Dirk_Nachbar1 · December 21, 2021, 5:12pm

Random remark, are you sure you want to model i/r as normal, and allow it to be negative? Maybe you want a half-normal.

fredzett · December 22, 2021, 1:46pm

Yes, you are right. I will not model it as normal. This was just a toy example and the normal prior does not make sense here. Should have used a different prior to avoid causing confusion.

Kenneth_Ottosen · October 31, 2023, 1:04pm

I have a follow-up PYMC3 question. What would I have to do in case I want to set the correlation between variables when variables are not all normal but a mixture of different kinds of distributions? In that case, it would not be appropriate to use a multivariate normal distribution (pm.MvNormal), as it only generates normal distributions. I am wondering if there is something like a multivariate distribution that can output different kind of distributions? I would like to do something similar to what is possible with Crystal Ball where it is possible to correlate different input variables having different distributions → https://www.oracle.com/docs/tech/middleware/correlated-assumptions.pdf

jessegrabowski · October 31, 2023, 2:14pm

I think the easiest option is to move the correlation out of the output distributions and into the structure of the model. For example, this paper (implemented in PyMC here) uses latent structure to model conditionally independent Poisson variables, which end up being correlated due to the hierarchical model structure .

I think the other alternative would be to work with copula models. @jonsedar has been doing a lot of work on these lately, I think he might be able to chime in with some general remarks on the subject?

Kenneth_Ottosen · November 1, 2023, 3:22pm

@jessegrabowski - Thank you very much for the links, were really helpful

jonsedar · November 6, 2023, 8:05pm

You rang? Yes I’ve been doing a few things with copulas on marginals recently - lots of bashing head against the wall, but I could try to help if you want

jessegrabowski · November 6, 2023, 8:08pm

Was just curious if you had any comments on the usefulness of directly modeling correlation between RVs by using copulas vs injecting conditional correlation via model structure.

Disclaimer: my knowledge of copulas begins and ends with how to spell the word.

jonsedar · November 6, 2023, 8:26pm

I’m probably so far down the copula rabbit hole that (potentially better) alternatives are a distant hope… sunk costs and all that

I think copulas are quite nice in the case that you want to allow the marginals to correlate in a pooled fashion regardless of the sub-models on each marginal - my intention is that this gains stability and flexibility in the design of the sub-models. Also to more easily use non-Gaussian copulas.

The alternative would be to require several features to form the sub-models of both marginals, and to correlate the sub-model coefficients, I think this would be a good example: McElreath, 2014 where he uses an MvN to correlate hierarchical hyperparams. I’m not sure if/how one could achieve this with a non-Gaussian copula…

Topic		Replies	Views
Multivariate normal distribution Questions	4	2597	March 29, 2019
Multilevel model with covariance of 3 variables Questions	9	874	October 10, 2023
How to plot matrix plot of correlations in pymc3? Questions	5	1510	March 11, 2019
How to build a multi-covariance model in pymc3? Questions	2	1219	October 7, 2020
Uncorrelated Covariance Matrix for an MvNormal Questions	6	592	March 10, 2022

Modelling multiple correlated variables

Related topics