** I edit as I have found the problem, I wasn’t stating the “observed” variable in the likelihood (y), but it would still be nice to have some advice on how to improve sampling, as estimates are still slightly off. My apologies, I’ll try not to make a habit of asking for help before finding a solution. **
Hi all. I’m trying to implement a simple SIR model on PYMC, based on this Stan article: Bayesian workflow for disease transmission modeling in Stan
At the moment I’m trying to implement it using Sunode GitHub - pymc-devs/sunode: Solve ODEs fast, with support for PyMC
However, my current attempt completely fails, the model does not converge (ESSs are very low) even with 2000 tuning/samples and target_accept = 0.95, gives estimates of beta, gamma, R0, and recovery time that are quite slightly off.
I must be doing something wrong, as it takes more than 10 minutes to sample for just 14 timepoints (i.e. 14 days). (Especially as Sunode cannot use multicore for Windows).. I’m still unsure whether the model is optimal, but I failed to find any simple examples in PyMC v5 (rather than PyMC3). So, any help or advice will be greatly appreciated. Many thanks in advance.
# -*- coding: utf-8 -*-
import numpy as np
import pymc as pm
import pytensor.tensor as at
import pandas as pd
import sunode
import sunode.wrappers.as_pytensor
import arviz as az
#Data from a 14 days influenza outbreak in a British boarding school with 763 male students
df = pd.read_csv("influenza_england_1978_school.csv")
infected = df.in_bed #cases
n_days = len(infected)
Ntot = 763 #total number of students
times = np.arange(n_days)
t_0 = 0
#use Sunode ODE solver for SIR odes model, p=parameter, y=variable, t=the over variable
def SIR(t, y, p):
return {
#'N': y.N, #N is the number of succeptibles + infected + recovered (i.e. the whole population)
'S': -(p.beta * y.S * y.I) / Ntot, #ODE for succeptible
'I': ((p.beta * y.S * y.I) / Ntot) - p.gamma * y.I, #ODE for infected
'R': p.gamma * y.I, #ODE for recovered
}
with pm.Model() as mod:
beta = pm.TruncatedNormal("beta", 2, 1, lower=0) #average number of contact per person per time parameter
gamma = pm.TruncatedNormal("gamma", 0.4, 0.5, lower=0) #inverse of recovery time parameter
y_hat, _, problem, solver, _, _ = sunode.wrappers.as_pytensor.solve_ivp(
y0={
'S': (Ntot-1, ()), #starting point of I
'I': (1, ()), #starting point of S
'R': (0, ()), #starting point of R
#'N': (Ntot, ()), #starting point of N
},
params={
'beta': (beta, ()),
'gamma': (gamma, ()),
'extra': np.zeros(1),
},
rhs=SIR,
tvals=times,
t0=0,
)
succ_estim = pm.Deterministic('Se', y_hat['S'])
infec_estim = pm.Deterministic('Ie', y_hat['I'])
recov_estim = pm.Deterministic('Re', y_hat['R'])
R0 = pm.Deterministic("R0", beta/gamma)
recovery_time = pm.Deterministic("recovery time", 1/gamma)
phi_inv = pm.Exponential("phi_inv", 1)
y = pm.NegativeBinomial("y", mu=infec_estim, alpha=phi_inv, observed=infected)
with mod:
idata = pm.sample(tune=2000, draws=2000, chains=4, cores=1, target_accept=0.95)
summ = az.summary(idata, var_names=['beta', 'gamma', 'R0', 'recovery time'])
summ
#pos = idata.stack(sample = ['chain', 'draw']).posterior
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (4 chains in 1 job)
NUTS: [beta, gamma, phi_inv]
|█████████████| 100.00% [4000/4000 01:57<00:00 Sampling chain 3, 0 divergences]Sampling 4 chains for 2_000 tune and 2_000 draw iterations (8_000 + 8_000 draws total) took 418 seconds.
Out[1]:
mean sd hdi_3% ... ess_bulk ess_tail r_hat
beta 2.098 0.131 1.873 ... 4676.0 3123.0 1.0
gamma 0.510 0.061 0.391 ... 4667.0 4226.0 1.0
R0 4.187 0.662 3.117 ... 4308.0 3403.0 1.0
recovery time 1.990 0.245 1.562 ... 4667.0 4226.0 1.0
[4 rows x 9 columns]
[CVODEA ERROR] CVodeF
At t = 0.101077, mxstep steps taken before reaching tout.
[CVODEA ERROR] CVodeF
At t = 0.101077, mxstep steps taken before reaching tout.
type or paste code here
PS: I also don’t quite understand the errors given by Sunode. Any info would be great, many thanks again.