How to turn a model into a bayesian model?

Hello,

I’m trying to take a standard time series model and change it into a bayesian format. Currently, I have it coded it to iterate through each row to produce a forecast, trend, level, season, etc.

Here is the standard exponential smoothing model:

def triple_exp_smooth_add(d, slen=12, extra_periods=1, alpha=0.4, beta=0.4, phi=0.9, gamma=0.3):

    cols = len(d) # Historical pteriod length
    d = np.append(d,[np.nan]*extra_periods) # Append np.nan into the demand array to cover future periods
    
    # components initialization     
    f,a,b,s = np.full((4,cols+extra_periods),np.nan)
    s = seasonal_factors_add(s,d,slen,cols)
        
    # Level & Trend initialization
    a[0] = d[0]-s[0]
    b[0] = (d[1]-s[1]) - (d[0]-s[0])
           
    # Create the forecast for the first season
    for t in range(1,slen):
        f[t] = a[t-1] + phi*b[t-1] + s[t]       
        a[t] = alpha*(d[t]-s[t]) + (1-alpha)*(a[t-1]+phi*b[t-1])       
        b[t] = beta*(a[t]-a[t-1]) + (1-beta)*phi*b[t-1]
                
    # Create all the t+1 forecast
    for t in range(slen,cols):
        f[t] = a[t-1] + phi*b[t-1] + s[t-slen]       
        a[t] = alpha*(d[t]-s[t-slen]) + (1-alpha)*(a[t-1]+phi*b[t-1])       
        b[t] = beta*(a[t]-a[t-1]) + (1-beta)*phi*b[t-1]         
        s[t] = gamma*(d[t]-a[t]) + (1-gamma)*s[t-slen] 
        
    # Forecast for all extra periods
    for t in range(cols,cols+extra_periods):
        f[t] = a[t-1] + phi*b[t-1] + s[t-slen]
        a[t] = f[t]-s[t-slen]
        b[t] = phi*b[t-1] 
        s[t] = s[t-slen]
                      
    df = pd.DataFrame.from_dict({'Demand':d,'Forecast':f,'Level':a,'Trend':b,'Season':s,'Error':d-f})

    return df

I tried to put that same model into the PyMC framework as follows:

def seasonal_factors_add(s,d,slen,cols):	   
    for i in range(slen):  
        idx = [x for x in range(cols) if x%slen==i] # Indices that correspond to this season
        s[i] = np.mean(d[idx]) # Calculate season average        
    s -= np.mean(s[:slen]) # Scale all season factors (sum of factors = 0)  
    return s
slen=12
t = df_sat['t']
eaches = np.array(df_sat['eaches'])
cols = len(list(df_sat['eaches']))
extra_periods = 12
f,a,b,s = np.full((4,cols+extra_periods),np.nan)
    
s = seasonal_factors_add(s,eaches,slen,cols)
a[0] = eaches[0]-s[0]
b[0] = (eaches[1]-s[1]) - (eaches[0]-s[0])


with pm.Model() as tesa_model:
    a_ =  pm.MutableData('a', a)
    b_ = pm.MutableData('b', b)
    s_ = pm.MutableData('s', s)
    t_ = pm.MutableData('t', t)
    eaches_ = pm.MutableData('eaches', eaches)
    
    
    alpha = pm.Normal('alpha', 0, 1,)
    beta = pm.Normal('beta', 0, 1)
    phi = pm.Normal('phi', 0, 1)
    gamma = pm.Normal('gamma', 0, 1)
    
    for t in range(1,slen):
        f[t] = a[t-1] + phi*b[t-1] + s[t]       
        a[t] = alpha*(d[t]-s[t]) + (1-alpha)*(a[t-1]+phi*b[t-1])       
        b[t] = beta*(a[t]-a[t-1]) + (1-beta)*phi*b[t-1]
    
    
    # Create all the t+1 forecast
    for t in range(slen,cols):
        f[t] = a[t-1] + phi*b[t-1] + s[t-slen]       
        a[t] = alpha*(d[t]-s[t-slen]) + (1-alpha)*(a[t-1]+phi*b[t-1])       
        b[t] = beta*(a[t]-a[t-1]) + (1-beta)*phi*b[t-1]         
        s[t] = gamma*(d[t]-a[t]) + (1-gamma)*s[t-slen]
    
    predictions = np.exp(f)
                      
    pm.Poisson('obs',
              predictions,
              observed = eaches_)


    trace = pymc.sampling_jax.sample_numpyro_nuts(tune=2000, draws = 2000)

That produced this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: float() argument must be a string or a number, not 'TensorVariable'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_7606/3401890868.py in <module>
     31 
     32     for t in range(1,slen):
---> 33         f[t] = a[t-1] + phi*b[t-1] + s[t]
     34         a[t] = alpha*(d[t]-s[t]) + (1-alpha)*(a[t-1]+phi*b[t-1])
     35         b[t] = beta*(a[t]-a[t-1]) + (1-beta)*phi*b[t-1]

ValueError: setting an array element with a sequence.

So I’m guessing I can’t iterate through each row/element of an array, but what’s the best way to change this model into the bayesian frame work using PyMC?

If this helps, here is a standard dataset.

d= [28,19,18,13,19,16,19,18,13,16,16,11,18,15,13,15,13,11,13,10,12]

Or at best, how do I refer to the previous element in these arrays inside a model object?

The proximate problem is that the quantity a[t-1] + phi*b[t-1] + s[t-slen] is an aesara symbolic tensor, not a number, so numpy isn’t letting you store it in f. But the ultimate problem is that looping over the data this way isn’t going to give you what you want, because you aren’t taking into account the “innovations” at each time step, which all need to be estimated.

Copying from Forecasting: Principals and Practice, it looks like your model is an additive seasonal model with dampened trend, something like:

\begin{align} y_t &= \ell_{t-1} + \phi b_{t-1} + s_{t-m} + \epsilon_t \\ \ell_t &= \ell_{t-1} + \phi b_{t-1} + \alpha \epsilon_t \\ b_t &= \phi b_{t-1} + \beta \epsilon_t \\ s_t &= s_{t-m} + \gamma \epsilon_t \end{align}

Which can be put into state space form. For concreteness, let’s say the seasonal lag m=3, otherwise I have to type out a bunch of dots and stuff in the matrices.

A state space model is written:

\begin{align}x_{t+1} &= T x_t + R \epsilon_{t+1} \\ y_t &= Z x_t + H \eta_t \end{align}

with:

x_t = \begin{bmatrix} \epsilon_t \\ \ell_t \\ b_t \\ s_t \\ s_{t-1} \\ s_{t-2} \end{bmatrix}, T = \begin{bmatrix} 0 & 0 & 0 & 0 & 0 &0 \\ \alpha & 1 & \phi & 0 & 0 & 1\\ \beta & 0 & \phi & 0 & 0 & 0 \\ \gamma & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1& 0 & 0 \end{bmatrix}, R = \begin{bmatrix} 1 \\ 0\\ 0\\ 0 \\ 0 \\ 0 \end{bmatrix}, Z = \begin{bmatrix} 1 & 1 & \phi & 1 & 0 & 0 \end{bmatrix}, H = 0.

This is a nice use-case for my PyMC Statespace “package” (ok it’s just a repo).

Statespace models estimated with a Kalman filter are inherently Bayesian; you always get confidence intervals thanks to the multivariate normal being conjugate with itself. You could also estimate this using statsmodels.api.tsa.statespace.ExponentialSmoothing. Your errors would be underestimated because they won’t account for model uncertainty (i.e., uncertainty about the parameter values themselves), but it’s still quite nice. My thing isn’t really production ready.