Why does my new data with length of 12, give me 48 posterior predictive observations?

jordan.howell2 · June 17, 2022, 6:22pm

Hello,

I have a prophet-like time series model. I’ve lopped off the year 2021 for an out of sample test data set. When I draw from the posterior, then use arviz to pull the mean values of the predictions, I’m getting 48 values, not 12. I’m sure it has to do with the 4 chains but I’m calculating the dimensions along the chains and draws.

Here is the model and sample code:

n_changepoints = 8
t = np.arange(len(df_sat)) / len(df_sat)
s = np.linspace(0, np.max(t), n_changepoints+2)[1:-1]

a = (t[:, None] > s)*1

n_components = 10

def yearly_X(t, p=365.25, n=n_components):
    x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
    return np.concatenate((np.cos(x), np.sin(x)), axis = 1)

monthly_n_components = 5
def monthly_X(t, p=30.5, n=monthly_n_components):
    x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
    return np.concatenate((np.cos(x), np.sin(x)), axis = 1) 

df_sat['promo_flag_month_before'] = df_sat['promo_flag'].shift(1)
df_sat['promo_flag_month_before'] = df_sat['promo_flag_month_before'].fillna(method = 'bfill')
promo = np.array(df_sat['promo_flag_month_before'])

df_sat_test['promo_flag_month_before'] = df_sat_test['promo_flag'].shift(1)
df_sat_test['promo_flag_month_before'] = df_sat_test['promo_flag_month_before'].fillna(method = 'bfill')
promo_test = np.array(df_sat_test['promo_flag_month_before'])

with pm.Model() as model:
    promo_ = pm.MutableData('promo', promo)
    t_ = pm.MutableData('t', t)
    a_ = pm.MutableData('a', a)
    s_ = pm.MutableData('s', s)
    
    k = pm.Normal('k', 0, 1)
    m = pm.Normal('m', 0, 5)
    delta = pm.Laplace('delta', 0, 0.1, shape=n_changepoints)
    
    growth = k + at.dot(a, delta)
    offset = (m + at.dot(a, -s * delta))
    trend = pm.Deterministic('trend', growth * t + offset)
    
    yearly_beta = pm.Normal('yearly_beta', 0, 1, shape = n_components*2)
    yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t, 365.25/len(t)), yearly_beta))
    
    monthly_beta = pm.Normal('monthly_beta', 0, 1, shape = monthly_n_components*2)
    monthly_seasonality = pm.Deterministic('monthly_seasonality',at.dot(monthly_X(t, 30.5/len(t)), monthly_beta))
    
    promo_beta = pm.Normal('promo_beta', 0, 1)
    
    predictions =  pm.Deterministic('predictions', np.exp(trend + yearly_seasonality + monthly_seasonality + promo_beta*promo))
    
    
    # error = pm.HalfCauchy('error', 0.5)
    
    pm.Poisson('obs',
              predictions,
              observed=df_sat['eaches']
        )
    
    trace = pymc.sampling_jax.sample_numpyro_nuts(tune=2000, draws = 2000)

t_test = np.arange(len(df_sat_test)) / len(df_sat_test)
s_test = np.linspace(0, np.max(t_test), n_changepoints+2)[1:-1]

a_test = (t_test[:, None] > s_test)*1

pm.set_data(new_data = {'promo':promo_test,
                       't':t_test,
                       's':s_test,
                       'a':a_test}, model = model)
test_ppc = pm.sample_posterior_predictive(trace, model=model)

test_ppc.posterior_predictive.obs.mean(dim=['draw','chain'])

The length of t_test and a_test and promo_test are all 12. The only other thing I can think of, is that the model was fit on 48 samples (4 years of monthly data). Not sure what to do if that’s the case. If the model is being ran to forecast out, I wouldn’t have observed data at that point.

cluhmann · June 17, 2022, 6:54pm

The posterior predictive should be of shape: (nchains, ndraws, nobservations). You can check by printing:

test_ppc.posterior_predictive.dims

jordan.howell2 · June 17, 2022, 6:57pm

Thank you. I get Frozen({'chain': 4, 'draw': 2000, 'obs_dim_0': 48}). But the new data points I want to draw samples for are only length 12, which are the lengths of the data I’m putting in for pm.set_data argument shown below.

What is the correct way to be able to get the posterior predictive mean and hdi for the 12 new data points?

pm.set_data(new_data = {'promo':promo_test, 't':t_test, 's':s_test, 'a':a_test}, model = model) test_ppc = pm.sample_posterior_predictive(trace, model=model)

OriolAbril · June 17, 2022, 7:01pm

You define these variables here but then don’t use them in the model. When you use pm.set_data those are the variables being modified, but modifying them has no effect on the model because the model uses the numpy/pandas objects directly.

jordan.howell2 · June 17, 2022, 7:06pm

Thank you. What does t_ = pm.MutableData('t', t) become in terms of type of object when I define them inside the model?

When I try to get len(t_), I get an error

TypeError: object of type 'TensorSharedVariable' has no len()

But I need the length for these parts of the code:

yearly_beta = pm.Normal('yearly_beta', 0, 1, shape = n_components*2)
    yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t_, 365.25/**len(t_)**), yearly_beta))
    
    monthly_beta = pm.Normal('monthly_beta', 0, 1, shape = monthly_n_components*2)
    monthly_seasonality = pm.Deterministic('monthly_seasonality',at.dot(monthly_X(t_, 30.5/**len(t_)**), monthly_beta))

UPDATE:
I don’t think it’s the len. I did the following:
len_t = pm.ConstantData('len_t', len(t)). Then I got the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_5309/1167067047.py in <module>
     16 
     17     yearly_beta = pm.Normal('yearly_beta', 0, 1, shape = n_components*2)
---> 18     yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t_, 365.25/len(t)), yearly_beta))
     19 
     20     monthly_beta = pm.Normal('monthly_beta', 0, 1, shape = monthly_n_components*2)

/tmp/ipykernel_5309/3136029572.py in yearly_X(t, p, n)
      9 def yearly_X(t, p=365.25, n=n_components):
     10     x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
---> 11     return np.concatenate((np.cos(x), np.sin(x)), axis = 1)
     12 
     13 monthly_n_components = 5

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: zero-dimensional arrays cannot be concatenated

I think my fourier_seasonality function needs to be rewritten in aesara to be used with pm.Data object?

OriolAbril · June 19, 2022, 5:02pm

You get an aesara shared tensorvariable. They mostly behave the same as regular aesara tensors. To get their length you can use tensor.shape[0]. Keep in mind that Aesara is static, so the result of this will also be an aesara variable that is only eveluated when needed.

You do not want the len(t) you want the length of t_ which will change every time you use set_data and update t_ ("t" in the set_data dict). You should also not use constantdata here because you plan on modifying the data and it’s shape.

jordan.howell2 · June 21, 2022, 10:04am

Thank you. When I use

yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t_, 365.25/t_.shape[0]), yearly_beta))

I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_6117/3787808774.py in <module>
     16 
     17     yearly_beta = pm.Normal('yearly_beta', 0, 1, shape = n_components*2)
---> 18     yearly_seasonality = pm.Deterministic('yearly_seasonality',at.dot(yearly_X(t_, 365.25/t_.shape[0]), yearly_beta))
     19 
     20     monthly_beta = pm.Normal('monthly_beta', 0, 1, shape = monthly_n_components*2)

/tmp/ipykernel_6117/3136029572.py in yearly_X(t, p, n)
      9 def yearly_X(t, p=365.25, n=n_components):
     10     x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
---> 11     return np.concatenate((np.cos(x), np.sin(x)), axis = 1)
     12 
     13 monthly_n_components = 5

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: zero-dimensional arrays cannot be concatenated

Is this because of the function yearly_X which is actually:

def yearly_X(t, p=365.25, n=n_components):
    x = 2 * np.pi * (np.arange(n)+1) * t[:, None] / p
    return np.concatenate((np.cos(x), np.sin(x)), axis = 1)

Should that be a function written in aesara for it to work with shared tensor variables?

When I used the np_array t, it was an array of shape(48,). But when I change t to a shared variable inside the model, the function reads it as a zero dimension…

jordan.howell2 · June 21, 2022, 11:15am

I got it. I just ran the function outside the model context, then brought that in as mutable data.

Topic		Replies	Views
Sample_posterior_predicitve not catching shape of new data v5 prediction	10	1265	August 24, 2022
Plotting posterior predictions from new data - shape mismatch version agnostic	4	538	January 21, 2023
Unable to predict using set_value with an errors-in-variables model Questions	9	1587	December 5, 2021
Out of sample prediction v5 modeling	20	1879	April 7, 2023
About sample_posterior_predictive v5 bug	1	419	June 28, 2023

Why does my new data with length of 12, give me 48 posterior predictive observations?

Related topics