Estimating Distribution of Transformed Variables

@ricardoV94 could collect_default_updates be confused because of how the multivariate RV is getting split up?

Thanks for your advice. But I am working on Windows system. Is there a way to achieve this?

Nutpie uses numba so that works fine on Windows. You can also install Windows Linux Subsystem to run JAX on Windows.

@Karie @jessegrabowski I am not familiar with the timeseries in question, but the issue I see has to do with splitting the MvNormal in the scan step. In general PyMC does not know what is the probability of pm.MvNormal.dist(...)[0], even if you have both [0], and [1] in the same graph.

I tried to rewrite it like this. I don’t know if the parameters are reasonable, I had to make V > 0 to avoid a negative sqrt:

import pymc as pm
import pytensor
import pytensor.tensor as pt
import numpy as np

from pymc.pytensorf import collect_default_updates

T = 10
dt = 0.1

def heston_dist(V0, cov_matrix, kappa, theta, mu, size):
    def step(prev_V_y, cov_matrix, kappa, theta, mu):
        prev_V = prev_V_y[0]
        epsilons = pm.MvNormal.dist(mu=0, cov=cov_matrix)
        new_V_y = epsilons * pt.sqrt(prev_V * dt) + pt.stack([
            (1 - kappa * dt) * prev_V + kappa * theta * dt,
            - 0.5 * prev_V * dt + mu * dt
        ])
        return new_V_y, collect_default_updates([new_V_y])

    V_y, updates = pytensor.scan(fn=step, sequences=None,
                                    outputs_info=[V0],
                                    non_sequences=[cov_matrix, kappa, theta, mu],
                                    n_steps=T)

    return V_y

V_y = pm.CustomDist.dist(pt.as_tensor([100., 100.]), pt.eye(2), 0.5, 1, 0.1, dist=heston_dist)
V_y_draws = pm.draw(V_y, random_seed=5)
print(V_y_draws)
V_y_logp = pm.logp(V_y, V_y_draws).eval()
print(V_y_logp)
[[94.55158601 -4.90335027]
 [90.70884389 -6.29162883]
 [93.34007854 -1.43847036]
 [88.94248107 -1.97778869]
 [85.97464178 -4.41327968]
 [79.49711185 -8.52657717]
 [82.0723072  -2.53941466]
 [75.79844901 -6.28833271]
 [66.24285817 -2.3678411 ]
 [63.58058006  0.90200788]]

[-4.15325839 -4.25231336 -7.35995992 -4.45863859 -4.13815517 -5.32269512
 -6.69617951 -4.53665335 -6.22594932 -5.08987744]
1 Like

I am not sure about the model. It seems you are generative two correlated variables but only observing one of them. Is it correct that you would only use the second column (y) in your model for conditioning?

If that’s the case you need to implement the probability manually because you need to marginalize over the unobserved V. PyMC can’t automatically do that for you. Or you need to sample those entries. I don’t know which of the two you were thinking of.

It would be nice to see the actual equations you’re implementing @Karie

Thanks very much for your reply. @ricardoV94 @jessegrabowski

I was implementing Heston stohastic volatility model.
The general formula is:

Discretizing the model:

To see more details, it is in Page3-4 of the following PDF:
Estimating Hestons and Bates models parameters using Markov chain Monte Carlo simulation.pdf (501.7 KB)

Thanks in advance!

Indeed, I have generated two correlated random variables, but Y is derived from V. So, is it not sufficient to have only the observed data for Y?

You need to either infer or marginalize the variable V, because Y is not completely independent of it.