Why does my new data with length of 12, give me 48 posterior predictive observations?

Hello,

I have a prophet-like time series model. I’ve lopped off the year 2021 for an out of sample test data set. When I draw from the posterior, then use arviz to pull the mean values of the predictions, I’m getting 48 values, not 12. I’m sure it has to do with the 4 chains but I’m calculating the dimensions along the chains and draws.

Here is the model and sample code:

n_changepoints = 8
t = np.arange(len(df_sat)) / len(df_sat)
s = np.linspace(0, np.max(t), n_changepoints+2)[1:-1]

a = (t[:, None] > s)*1

n_components = 10

def yearly_X(t, p=365.25, n=n_components):
    x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
    return np.concatenate((np.cos(x), np.sin(x)), axis = 1)

monthly_n_components = 5
def monthly_X(t, p=30.5, n=monthly_n_components):
    x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
    return np.concatenate((np.cos(x), np.sin(x)), axis = 1) 

df_sat['promo_flag_month_before'] = df_sat['promo_flag'].shift(1)
df_sat['promo_flag_month_before'] = df_sat['promo_flag_month_before'].fillna(method = 'bfill')
promo = np.array(df_sat['promo_flag_month_before'])

df_sat_test['promo_flag_month_before'] = df_sat_test['promo_flag'].shift(1)
df_sat_test['promo_flag_month_before'] = df_sat_test['promo_flag_month_before'].fillna(method = 'bfill')
promo_test = np.array(df_sat_test['promo_flag_month_before'])

with pm.Model() as model:
    promo_ = pm.MutableData('promo', promo)
    t_ = pm.MutableData('t', t)
    a_ = pm.MutableData('a', a)
    s_ = pm.MutableData('s', s)
    
    k = pm.Normal('k', 0, 1)
    m = pm.Normal('m', 0, 5)
    delta = pm.Laplace('delta', 0, 0.1, shape=n_changepoints)
    
    growth = k + at.dot(a, delta)
    offset = (m + at.dot(a, -s * delta))
    trend = pm.Deterministic('trend', growth * t + offset)
    
    yearly_beta = pm.Normal('yearly_beta', 0, 1, shape = n_components*2)
    yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t, 365.25/len(t)), yearly_beta))
    
    monthly_beta = pm.Normal('monthly_beta', 0, 1, shape = monthly_n_components*2)
    monthly_seasonality = pm.Deterministic('monthly_seasonality',at.dot(monthly_X(t, 30.5/len(t)), monthly_beta))
    
    promo_beta = pm.Normal('promo_beta', 0, 1)
    
    predictions =  pm.Deterministic('predictions', np.exp(trend + yearly_seasonality + monthly_seasonality + promo_beta*promo))
    
    
    # error = pm.HalfCauchy('error', 0.5)
    
    pm.Poisson('obs',
              predictions,
              observed=df_sat['eaches']
        )
    
    trace = pymc.sampling_jax.sample_numpyro_nuts(tune=2000, draws = 2000)

t_test = np.arange(len(df_sat_test)) / len(df_sat_test)
s_test = np.linspace(0, np.max(t_test), n_changepoints+2)[1:-1]

a_test = (t_test[:, None] > s_test)*1

pm.set_data(new_data = {'promo':promo_test,
                       't':t_test,
                       's':s_test,
                       'a':a_test}, model = model)
test_ppc = pm.sample_posterior_predictive(trace, model=model)

test_ppc.posterior_predictive.obs.mean(dim=['draw','chain'])

The length of t_test and a_test and promo_test are all 12. The only other thing I can think of, is that the model was fit on 48 samples (4 years of monthly data). Not sure what to do if that’s the case. If the model is being ran to forecast out, I wouldn’t have observed data at that point.

The posterior predictive should be of shape: (nchains, ndraws, nobservations). You can check by printing:

test_ppc.posterior_predictive.dims

Thank you. I get Frozen({'chain': 4, 'draw': 2000, 'obs_dim_0': 48}). But the new data points I want to draw samples for are only length 12, which are the lengths of the data I’m putting in for pm.set_data argument shown below.

What is the correct way to be able to get the posterior predictive mean and hdi for the 12 new data points?

pm.set_data(new_data = {'promo':promo_test, 't':t_test, 's':s_test, 'a':a_test}, model = model) test_ppc = pm.sample_posterior_predictive(trace, model=model)

You define these variables here but then don’t use them in the model. When you use pm.set_data those are the variables being modified, but modifying them has no effect on the model because the model uses the numpy/pandas objects directly.

Thank you. What does t_ = pm.MutableData('t', t) become in terms of type of object when I define them inside the model?

When I try to get len(t_), I get an error

TypeError: object of type 'TensorSharedVariable' has no len()

But I need the length for these parts of the code:

yearly_beta = pm.Normal('yearly_beta', 0, 1, shape = n_components*2)
    yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t_, 365.25/**len(t_)**), yearly_beta))
    
    monthly_beta = pm.Normal('monthly_beta', 0, 1, shape = monthly_n_components*2)
    monthly_seasonality = pm.Deterministic('monthly_seasonality',at.dot(monthly_X(t_, 30.5/**len(t_)**), monthly_beta))

UPDATE:
I don’t think it’s the len. I did the following:
len_t = pm.ConstantData('len_t', len(t)). Then I got the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_5309/1167067047.py in <module>
     16 
     17     yearly_beta = pm.Normal('yearly_beta', 0, 1, shape = n_components*2)
---> 18     yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t_, 365.25/len(t)), yearly_beta))
     19 
     20     monthly_beta = pm.Normal('monthly_beta', 0, 1, shape = monthly_n_components*2)

/tmp/ipykernel_5309/3136029572.py in yearly_X(t, p, n)
      9 def yearly_X(t, p=365.25, n=n_components):
     10     x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
---> 11     return np.concatenate((np.cos(x), np.sin(x)), axis = 1)
     12 
     13 monthly_n_components = 5

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: zero-dimensional arrays cannot be concatenated

I think my fourier_seasonality function needs to be rewritten in aesara to be used with pm.Data object?

You get an aesara shared tensorvariable. They mostly behave the same as regular aesara tensors. To get their length you can use tensor.shape[0]. Keep in mind that Aesara is static, so the result of this will also be an aesara variable that is only eveluated when needed.

You do not want the len(t) you want the length of t_ which will change every time you use set_data and update t_ ("t" in the set_data dict). You should also not use constantdata here because you plan on modifying the data and it’s shape.

Thank you. When I use

yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t_, 365.25/t_.shape[0]), yearly_beta))

I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_6117/3787808774.py in <module>
     16 
     17     yearly_beta = pm.Normal('yearly_beta', 0, 1, shape = n_components*2)
---> 18     yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t_, 365.25/t_.shape[0]), yearly_beta))
     19 
     20     monthly_beta = pm.Normal('monthly_beta', 0, 1, shape = monthly_n_components*2)

/tmp/ipykernel_6117/3136029572.py in yearly_X(t, p, n)
      9 def yearly_X(t, p=365.25, n=n_components):
     10     x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
---> 11     return np.concatenate((np.cos(x), np.sin(x)), axis = 1)
     12 
     13 monthly_n_components = 5

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: zero-dimensional arrays cannot be concatenated

Is this because of the function yearly_X which is actually:

def yearly_X(t, p=365.25, n=n_components):
    x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
    return np.concatenate((np.cos(x), np.sin(x)), axis = 1)

Should that be a function written in aesara for it to work with shared tensor variables?

When I used the np_array t, it was an array of shape(48,). But when I change t to a shared variable inside the model, the function reads it as a zero dimension…

I got it. I just ran the function outside the model context, then brought that in as mutable data.