# Estimating parameters of Random Volatility Models using `pytensor.scan` and `pm.CustomDist`

Hello everyone,

I’m currently working on fitting a random volatility model using PyMC. Here’s my code:

``````import multiprocessing
import numpy as np
import pandas as pd
import pymc as pm
import xarray as xr
import arviz as az
import pytensor
import pytensor.tensor as pt

def main():

BURNIN = 5000
N = 15000
dt = 1/240

Y = np.diff(np.log(stock_price_data["convvalue"]))
T = len(Y)

def heston_dist(V0, cov_matrix, kappa, theta, mu, sigma_err, size):

def step(*args):
V_tml, cov_matrix, kappa, theta, mu, sigma_err = args

epsilons = pm.MvNormal.dist(mu=0, cov=cov_matrix)

V = epsilons * pt.sqrt(V_tml * dt) + (1 - kappa * dt) * V_tml + kappa * theta * dt

mu_Y = epsilons * pt.sqrt(V_tml * dt) - 0.5 * V_tml * dt + mu * dt
y = pm.Normal.dist(mu=mu_Y, sigma=sigma_err)

outputs_info=[V0, None],
non_sequences=[cov_matrix, kappa, theta, mu, sigma_err],
strict=True,
n_steps=T)

return y

def calc_cov_matrix(phi, omega):
cov_matrix = pm.math.stack([
[1., phi],
[phi, phi**2 + omega]
])
return cov_matrix

with pm.Model() as mod:

mu = pm.Normal('mu', 0, sigma=1)
kappa = pm.Normal('kappa', 0, sigma=1)
theta = pm.Normal('theta', 0, sigma=1)
omega = pm.InverseGamma('omega', alpha=2, beta=0.005)

std_dev = pm.Deterministic('std_dev', omega / 2)
phi = pm.Normal('phi', 0, sigma=std_dev)

V0 = pm.TruncatedNormal('V0', mu=0.0225, sigma=0.005, lower=0)

cov_matrix = pm.Deterministic('cov_matrix', calc_cov_matrix(phi, omega))

y_obs =  pm.ConstantData('y_obs', Y)
sigma_err = pm.HalfNormal('sigma', 0.1) # variance

y = pm.CustomDist('obs',
V0, cov_matrix, kappa, theta, mu, sigma_err,
dist=heston_dist,
observed=y_obs)

idata = pm.sample()

if __name__ == '__main__':
multiprocessing.freeze_support()
main()
``````

I have a few questions:

1. My observed data is related to `Y`. Do I need to convert `Y` into `pm.ConstantData` or `pm.MutableData`? What’s the difference between the two?

2. An issue has been bothering me for quite a while. In the `scan` function, `Y` is obtained through `T` sets of random numbers following a bivariate normal distribution `epsilons`. As a result, after each iteration, I should have `T` values of `Y`, forming a time series of length `T`. Meanwhile, my observed data for `Y` also has a length of `T`. My goal is to fit the generated `Y` using the observed data in order to estimate the parameters. I’m wondering if the `pm.CustomDist` function I’ve written can address this problem effectively?

3. When I run the code, I encounter the following error. But I don’t think I have missed any input.

``````pytensor.graph.utils.MissingInputError: OpFromGraph is missing an input: *1-<RandomGeneratorType>
``````

To specific Question 2, I am not sure whether using `y = pm.Normal.dist(mu=mu_Y, sigma=sigma_err)` can meet my goals or not. After all, it is not a prediction model but a Stochastic Model.

I’ll try to look at your model more carefully later tonight, but there’s a tutorial on PyMC data containers here that should answer your question 1 in excruciating detail. This notebook of a GARCH model might help with question 2?

I sucessfully run my codes without `pm.CustomDist`.

``````    def step(*args):
epsilons, V_tml, kappa, theta = args
V = epsilons * pt.sqrt(V_tml * dt) + (1 - kappa * dt) * V_tml + kappa * theta * dt
return V

def heston_pt(epsilons, V0, kappa, theta, mu):

sequences=[epsilons],
outputs_info=[V0],
non_sequences=[kappa, theta],
strict=True,
n_steps=T
)

y = epsilons[1:, 0] * pt.sqrt(V[:-1] * dt) - 0.5 * V[:-1] * dt + mu * dt
return y

def make_model(Y, T, mode=None):

with pm.Model() as mod:
mu = pm.Normal('mu', 0, sigma=1)
kappa = pm.Normal('kappa', 0, sigma=1)
theta = pm.Normal('theta', 0, sigma=1)
omega = pm.InverseGamma('omega', alpha=2, beta=0.005)

std_dev = pm.Deterministic('std_dev', omega / 2)
phi = pm.Normal('phi', 0, sigma=std_dev)

V0 = pm.TruncatedNormal('V0', mu=0.0225, sigma=0.005, lower=0)

cov_matrix = pm.Deterministic('cov_matrix', calc_cov_matrix(phi, omega))
epislons = pm.MvNormal('epislons', mu=np.zeros(2), cov=cov_matrix, shape=(T, 2))

y = pm.Deterministic('y', heston_pt(epislons, V0, kappa, theta, mu))

sigma_err = pm.Exponential('sigma', 0.1)
obs = pm.Normal('obs', mu=y, sigma=sigma_err, observed=Y[1:])

return mod
``````

But it takes me more than 3 hours to get my results and the divergences seem large.

``````WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Auto-assigning NUTS sampler...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu, kappa, theta, omega, phi, V0, epislons, sigma]
|███-| 95.30% [76241/80000 2:46:42<08:13 Sampling 4 chains, 55,771 divergences]
``````

I wonder is it normal that it takes so much time? How can I speed up this sampler progress within Windows system.

Any advice will be dearly appreciated.

The `pytensor.tensor.blas` warning says your sampling is going to be slow. Did you install PyMC using the recommended instructions?

I installed `pymc` by `pip install pymc`.