 # How to sample efficiently from time series data?

I have the following time series model

``````S[t+1] = S[t] - B[t]
E[t+1] = E[t] +B[t] - C[t]
I[t+1] = I[t+1] + C[t] - D[t]

B[t] ~ Binom(S[t], beta)
C[t] ~ Binom(E[t], gamma)
D[t] ~ Binom(I[t], delta)
beta, gamma, delta ~ Beta(a, b)
``````

Only C and D are observed and we know S, E, I, a and b. What is the best way to represent this model in PyMC3?

Currently I do the following:

``````with pm.Model() as model:
s0, e0, i0 = 100, 50, 25
C = np.array([3, 5, 8, 13, 21, 26, 10, 3])
D = np.array([1, 2, 3, 7, 9, 11, 5, 1])
beta = pm.Beta('beta', 2, 10)
gamma = pm.Beta('rho', 2, 10)
delta = pm.Beta('gamma', 2, 10)

B = []
st, et, it = s0, e0, i0
for t in range(len(C)):
bt = pm.Binomial(f'b_{t}', st, beta)
ct = pm.Binomial(f'c_{t}', et, gamma, observed=C[t])
dt = pm.Binomial(f'd_{t}', it, delta, observed=D[t])

st = st - bt
et = et + bt - ct
it = it + ct - dt
B.append(bt)
step1 = pm.Metropolis(B)
step2 = pm.NUTS([beta, rho, gamma])
trace = pm.sample(1000, tune=1000, step=[step1, step2])
``````

However, this creates one Metropolis sampler for each B[t] and does not scale well for a long time series. I would like to sample the entire B at once with one sampler. I also have a transition distribution for B in mind that might help the convergence.

How can I do it with PyMC3?

You need to write the for loop into a `theano.scan` for performance.

You can specify that in https://docs.pymc.io/api/inference.html?highlight=metropolis#module-pymc3.step_methods.metropolis with `proposal_dist` kwarg

Something along the line of:

``````s0, e0, i0 = 100., 50., 25.
st0, et0, it0 = [theano.shared(x) for x in [s0, e0, i0]]

C = np.array([3, 5, 8, 13, 21, 26, 10, 3], dtype=np.float64)
D = np.array([1, 2, 3, 7, 9, 11, 5, 1], dtype=np.float64)

def seir_one_step(st0, et0, it0, beta, gamma, delta):
bt0 = st0 * beta
ct0 = et0 * gamma
dt0 = it0 * delta

st1 = st0 - bt0
et1 = et0 + bt0 - ct0
it1 = it0 + ct0 - dt0
return st1, et1, it1

with pm.Model() as model:
beta = pm.Beta('beta', 2, 10)
gamma = pm.Beta('rho', 2, 10)
delta = pm.Beta('gamma', 2, 10)

(st, et, it), updates = theano.scan(
fn=seir_one_step,
outputs_info=[st0, et0, it0],
non_sequences=[beta, gamma, delta],
n_steps=len(C))

ct = pm.Binomial('c_t', et, gamma, observed=C)
dt = pm.Binomial('d_t', it, delta, observed=D)

trace = pm.sample()

with model:
bt = pm.Binomial('b_t', st, beta, shape=len(C))
ppc_trace = pm.sample_posterior_predictive(trace, var_names=['b_t'])
``````

If you are interested in SEIR like model, Prediction of Danish Covid19 cases is a great example.

2 Likes

Thank you for your response! I have a few clarifying questions. If I understand correctly, the following code

``````def seir_one_step(st0, et0, it0, beta, gamma, delta):
bt0 = st0 * beta
ct0 = et0 * gamma
dt0 = it0 * delta

st1 = st0 - bt0
et1 = et0 + bt0 - ct0
it1 = it0 + ct0 - dt0
return st1, et1, it1
``````

uses mean value of each `B[t]`, `C[t]` and `D[t]` to initialize `S`, `E` and `I` for `len(C) ` time steps. Prior to sampling we specify that `C[t]` and `D[t]` have binomial distribution with given observed data. However, we do not specify the distribution of `B` before the line

``````trace = pm.sample()
``````

Here are my questions:

1. We are not directly using B, C or D to update S, E, I. (the `seir_one_step` uses only mean values.) Then how does the observed data affect values of the parameters?
2. How does the model know `B[t]` has binomial likelihood? And how does it impute `B`?

The information is connected through `ct = pm.Binomial('c_t', et, gamma, observed=C)`, where the output variable `et` is produced by the theano.scan

It does not, but as you are not observing B[t] anyway, it doesnt really matters to have precise likelihood.

I know that I am late to this question and in addition I am asking only a remotely related question, but I am trying to do something similar like light_kira and the inner workings of tt.scan confuse me.

I have dummed down the above functionality to:

s0= 100.

C = np.array([3, 5, 8, 13, 21, 26, 10, 3], dtype=np.float64)
D = np.array([1, 2, 3, 7, 9, 11, 5, 1], dtype=np.float64)

def seir_one_step(st0):
st1 = st0 + tt.as_tensor_variable(1.0)
return st1

sk0 = [tt.dscalar() for x in [s0]]

f(1.0)
array([[2.],
[3.],
[4.],
[5.],
[6.],
[7.],
[8.],
[9.]])

But this outputs a 2d tensor.

The only way I seem to be able to generate a 1d tensor is by introducing a dummy second variable like this:

s0= 100.
st0 = [theano.shared(x) for x in [s0]]

C = np.array([3, 5, 8, 13, 21, 26, 10, 3], dtype=np.float64)
D = np.array([1, 2, 3, 7, 9, 11, 5, 1], dtype=np.float64)

def seir_one_step(st0, dt0):
st1 = st0 + tt.as_tensor_variable(1.0)
return st1, tt.as_tensor_variable(np.array(1.0, dtype=np.float64))

sk0,dk0 = [tt.dscalar() for x in [s0, 1.0]]