Estimating parameters of Random Volatility Models using `pytensor.scan` and `pm.CustomDist`

Karie · August 10, 2023, 3:02am

Hello everyone,

I’m currently working on fitting a random volatility model using PyMC. Here’s my code:

import multiprocessing
import numpy as np
import pandas as pd
import pymc as pm
from pymc.pytensorf import collect_default_updates
import xarray as xr
import arviz as az
import pytensor
import pytensor.tensor as pt


def main():

    BURNIN = 5000
    N = 15000
    dt = 1/240
    

    stock_price_data = pd.read_csv("convvalue_data//2023-05-12&110043.SH.csv")
    Y = np.diff(np.log(stock_price_data["convvalue"]))
    T = len(Y)
   
    
    def heston_dist(V0, cov_matrix, kappa, theta, mu, sigma_err, size):
        
        def step(*args):
            V_tml, cov_matrix, kappa, theta, mu, sigma_err = args
            
            epsilons = pm.MvNormal.dist(mu=0, cov=cov_matrix)
            
            V = epsilons[1] * pt.sqrt(V_tml * dt) + (1 - kappa * dt) * V_tml + kappa * theta * dt
            
            mu_Y = epsilons[0] * pt.sqrt(V_tml * dt) - 0.5 * V_tml * dt + mu * dt
            y = pm.Normal.dist(mu=mu_Y, sigma=sigma_err)
            
            return (V, y), collect_default_updates(inputs=args, outputs=[y])
        
        (V, y), updates = pytensor.scan(fn=step, 
                                        outputs_info=[V0, None], 
                                        non_sequences=[cov_matrix, kappa, theta, mu, sigma_err],
                                        strict=True,
                                        n_steps=T)
        
        return y
    
    
    def calc_cov_matrix(phi, omega):
        cov_matrix = pm.math.stack([
            [1., phi],
            [phi, phi**2 + omega]
        ])
        return cov_matrix
        
    
    with pm.Model() as mod:    
        
        mu = pm.Normal('mu', 0, sigma=1)
        kappa = pm.Normal('kappa', 0, sigma=1)
        theta = pm.Normal('theta', 0, sigma=1)
        omega = pm.InverseGamma('omega', alpha=2, beta=0.005)
        
        std_dev = pm.Deterministic('std_dev', omega / 2)
        phi = pm.Normal('phi', 0, sigma=std_dev)
        
        V0 = pm.TruncatedNormal('V0', mu=0.0225, sigma=0.005, lower=0)
        
        cov_matrix = pm.Deterministic('cov_matrix', calc_cov_matrix(phi, omega))
        
        y_obs =  pm.ConstantData('y_obs', Y)
        sigma_err = pm.HalfNormal('sigma', 0.1) # variance
        
        y = pm.CustomDist('obs', 
                          V0, cov_matrix, kappa, theta, mu, sigma_err,
                          dist=heston_dist, 
                          observed=y_obs)
        
        idata = pm.sample()

if __name__ == '__main__':
    multiprocessing.freeze_support()
    main()

I have a few questions:

My observed data is related to Y. Do I need to convert Y into pm.ConstantData or pm.MutableData? What’s the difference between the two?
An issue has been bothering me for quite a while. In the scan function, Y is obtained through T sets of random numbers following a bivariate normal distribution epsilons. As a result, after each iteration, I should have T values of Y, forming a time series of length T. Meanwhile, my observed data for Y also has a length of T. My goal is to fit the generated Y using the observed data in order to estimate the parameters. I’m wondering if the pm.CustomDist function I’ve written can address this problem effectively?
When I run the code, I encounter the following error. But I don’t think I have missed any input.

pytensor.graph.utils.MissingInputError: OpFromGraph is missing an input: *1-<RandomGeneratorType>

Karie · August 10, 2023, 3:05am

To specific Question 2, I am not sure whether using y = pm.Normal.dist(mu=mu_Y, sigma=sigma_err) can meet my goals or not. After all, it is not a prediction model but a Stochastic Model.

jessegrabowski · August 10, 2023, 2:09pm

I’ll try to look at your model more carefully later tonight, but there’s a tutorial on PyMC data containers here that should answer your question 1 in excruciating detail. This notebook of a GARCH model might help with question 2?

Karie · August 11, 2023, 4:16am

I sucessfully run my codes without pm.CustomDist.

    def step(*args):
        epsilons, V_tml, kappa, theta = args
        V = epsilons[1] * pt.sqrt(V_tml * dt) + (1 - kappa * dt) * V_tml + kappa * theta * dt
        return V
    
    def heston_pt(epsilons, V0, kappa, theta, mu):
        
        V, updates = pytensor.scan(fn=step,
                                   sequences=[epsilons],
                                   outputs_info=[V0],
                                   non_sequences=[kappa, theta],
                                   strict=True,
                                   n_steps=T
            )

        y = epsilons[1:, 0] * pt.sqrt(V[:-1] * dt) - 0.5 * V[:-1] * dt + mu * dt
        return y

    def make_model(Y, T, mode=None):
        
        with pm.Model() as mod:    
            mu = pm.Normal('mu', 0, sigma=1)
            kappa = pm.Normal('kappa', 0, sigma=1)
            theta = pm.Normal('theta', 0, sigma=1)
            omega = pm.InverseGamma('omega', alpha=2, beta=0.005)
            
            std_dev = pm.Deterministic('std_dev', omega / 2)
            phi = pm.Normal('phi', 0, sigma=std_dev)
            
            V0 = pm.TruncatedNormal('V0', mu=0.0225, sigma=0.005, lower=0)
            
            cov_matrix = pm.Deterministic('cov_matrix', calc_cov_matrix(phi, omega))
            epislons = pm.MvNormal('epislons', mu=np.zeros(2), cov=cov_matrix, shape=(T, 2))
            
            y = pm.Deterministic('y', heston_pt(epislons, V0, kappa, theta, mu))
            
            sigma_err = pm.Exponential('sigma', 0.1)
            obs = pm.Normal('obs', mu=y, sigma=sigma_err, observed=Y[1:])
        
        return mod

But it takes me more than 3 hours to get my results and the divergences seem large.

WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu, kappa, theta, omega, phi, V0, epislons, sigma]
 |███-| 95.30% [76241/80000 2:46:42<08:13 Sampling 4 chains, 55,771 divergences]

I wonder is it normal that it takes so much time? How can I speed up this sampler progress within Windows system.

Any advice will be dearly appreciated.

jessegrabowski · August 11, 2023, 7:16am

The pytensor.tensor.blas warning says your sampling is going to be slow. Did you install PyMC using the recommended instructions?

It will also be better if you can keep all your questions about this model in a single thread, so the answers aren’t scattered into so many places.

Karie · August 11, 2023, 8:10am

Thanks for your reply!

I installed pymc by pip install pymc.

Topic		Replies	Views
Estimating Distribution of Transformed Variables v5 modeling	28	1306	August 14, 2023
Mu of pm.Normal() depends on output of RandomVariable on previous time step. How do I implement this? v5 time_series , modeling	8	731	July 26, 2023
Error with custom distribution after using scan v5 bug	11	125	February 24, 2025
Using CustomDist as a stochastic variable - causing TensorType error v5 pytensor	6	349	December 11, 2023
Declaring Priors using Loops for State Space Models	11	1024	August 7, 2024

Estimating parameters of Random Volatility Models using `pytensor.scan` and `pm.CustomDist`

Related topics