Modelling multiple correlated variables

I am completley new to pymc3 and really like the api and the flexibility that comes along.

However, I am stuck at modelling something a cash flow model. For sake of simplicity let’s say I want to model the following:

y = \sum_{t=0}^T \dfrac{CF_t}{(1+r)^t}

where all CF and r are normally distributed and CFs are correlated.

How would I best model this assuming the following priors

  1. all CFs follow a normal distribution with different mus and stds
  2. all CFs are correlated
  3. r follows a normal distribution

My main challenges are:

  1. how do I best create T parameters with different mus and std. I know that I can pass a shape parameter, but I cannot pass different mus and stds? Is the only way to create individual parameters?
  2. how do I model that y is the sum of different parameters?
  3. how do I model the correlation part? (this is likely a prob programming question rather than a pymc3 question

My below attempt works but has two pitfalls:

  1. if T = 100 I had to model 100 separate variables
  2. it does not account for correlations between cf1, cf2 and cf3
T = 3
model = pm.Model()
with model:
    # Define priors
    cf1 = pm.Normal("cf1",100,10)
    cf2 = pm.Normal("cf2",100,15)
    cf3 = pm.Normal("cf3",100,20)

    i = pm.Normal("i", 0.1,0.04)
    t = range(0,T)
    
    # Model output and treat as deterministic to keep track
    y = pm.Deterministic("y",cf1/(1+i)**t[0] + cf2/(1+i)**t[1] + cf3/(1+i)**t[2])
    
    trace = pm.sample()

Can anyone point me to the right direction?

Thanks for your help!

(Note that this is a very simplified version of what I really want to achieve)

I think I can answer 3: if you want to include correlations in your prior, then you need to create a covariance matrix, which will contain the information about correlations between variables (covariance is a measure of how the variables vary together). Then you need to load that covariance matrix into a multivariate distribution, which in your case will be:
cfs = pm.MvNormal("cfs", mu=[100, 100, 100], cov=your_covariance_matrix, shape=3)

Great thanks for the swift help. I tried this multiple times and it didn’t work given I forgot to set the shape parameter…THANKS!

Random remark, are you sure you want to model i/r as normal, and allow it to be negative? Maybe you want a half-normal.

Yes, you are right. I will not model it as normal. This was just a toy example and the normal prior does not make sense here. Should have used a different prior to avoid causing confusion.

I have a follow-up PYMC3 question. What would I have to do in case I want to set the correlation between variables when variables are not all normal but a mixture of different kinds of distributions? In that case, it would not be appropriate to use a multivariate normal distribution (pm.MvNormal), as it only generates normal distributions. I am wondering if there is something like a multivariate distribution that can output different kind of distributions? I would like to do something similar to what is possible with Crystal Ball where it is possible to correlate different input variables having different distributions → https://www.oracle.com/docs/tech/middleware/correlated-assumptions.pdf

I think the easiest option is to move the correlation out of the output distributions and into the structure of the model. For example, this paper (implemented in PyMC here) uses latent structure to model conditionally independent Poisson variables, which end up being correlated due to the hierarchical model structure .

I think the other alternative would be to work with copula models. @jonsedar has been doing a lot of work on these lately, I think he might be able to chime in with some general remarks on the subject?

@jessegrabowski - Thank you very much for the links, were really helpful :+1:

1 Like

You rang? Yes I’ve been doing a few things with copulas on marginals recently - lots of bashing head against the wall, but I could try to help if you want

Was just curious if you had any comments on the usefulness of directly modeling correlation between RVs by using copulas vs injecting conditional correlation via model structure.

Disclaimer: my knowledge of copulas begins and ends with how to spell the word.

I’m probably so far down the copula rabbit hole that (potentially better) alternatives are a distant hope… sunk costs and all that :smiley:

I think copulas are quite nice in the case that you want to allow the marginals to correlate in a pooled fashion regardless of the sub-models on each marginal - my intention is that this gains stability and flexibility in the design of the sub-models. Also to more easily use non-Gaussian copulas.

The alternative would be to require several features to form the sub-models of both marginals, and to correlate the sub-model coefficients, I think this would be a good example: McElreath, 2014 where he uses an MvN to correlate hierarchical hyperparams. I’m not sure if/how one could achieve this with a non-Gaussian copula…

1 Like