I have begun implementing the changes you suggested in my original code, but I have encountered a problem that I am struggling to resolve elegantly. I have found a solution mapped up in my brain but I believe there may be a way more efficient approach. Additionally, the indexing of theano tensors, specifically the random variables, is a new concept to me, and I find it challenging to comprehend even though i have read the docs.
To further illustrate the issue, I have modified your provided code and introduced new variables to demonstrate the problem. In this instance, we have random variables that correspond to specific dates. Consequently, we require a method for indexing subsets of the training data and concatenating or appending them to the non-special cases, or alternatively, devising a more sophisticated solution. As the number of special cases increases, the indexing process becomes increasingly complicated.
I would appreciate any guidance on how to tackle this issue, or if possible, a reference to any related work that could provide me with insights on this topic.
thanks in advance.
def main():
# I increased the timerange because it didn't actually make it to 140 observations
# in your example. I also renamed "input" to "data" -- a personal preference.
timerange = pd.date_range(start="2023-01-01", end="2023-12-31", freq='D')
coefficientknots = {'0-15': 2.0, '15-32': 3.0}
exponentknots = {'0-10': 0.9, '10-32': 0.3}
specialdates = True
if specialdates:
coefficients_special_date_knots = {'0-24': 6.0, '24-32': 4.0}
exponent_special_date_knots = {'0-19': 0.95, '19-32': 0.6}
else:
coefficients_special_date_knots = {}
exponent_special_date_knots = {}
data = generate_input_data(timerange)
true_values = compute_true_values(timerange, coefficientknots, exponentknots,
coefficients_special_date_knots, exponent_special_date_knots)
# Gather stuff into a dataframe for organization
df = pd.DataFrame(np.c_[data, true_values], index=timerange, columns=['data', 'targets'])
df['day'] = df.index.day
mask = (df.index.month == 12) & (df.index.day >= 0) & (df.index.day <= 32)
# Create the map by assiging a bin index to each datapoint based on which day it falls into
coefficients_knot_cutpoints = [0, 15, 32]
df['coefficient_knot_idx_normal'] = pd.cut(df['day'], coefficients_knot_cutpoints, labels=False)
df.loc[mask, 'coefficient_knot_idx_normal'] = -1
exponent_knot_cutpoints = [0, 10, 32]
df['exponent_knot_idx_normal'] = pd.cut(df['day'], exponent_knot_cutpoints, labels=False)
df.loc[mask, 'exponent_knot_idx_normal'] = -1
coefficient_knot_names = [f'{a}-{b}' for a, b in zip(coefficients_knot_cutpoints[:-1],
coefficients_knot_cutpoints[1:])]
# These names will be used to make the model output look nicer
exponent_knot_names = [f'{a}-{b}' for a, b in zip(exponent_knot_cutpoints[:-1],
exponent_knot_cutpoints[1:])]
exponent_knot_cutpoints_special = [0, 24, 32]
exponent_knot_names_special = [f'{a}-{b}' for a, b in zip(exponent_knot_cutpoints_special[:-1],
exponent_knot_cutpoints_special[1:])]
df['exponent_knot_idx_special'] = -1
df.loc[mask, 'exponent_knot_idx_special'] = pd.cut(df.loc[mask, 'day'],
exponent_knot_cutpoints_special,
labels=False)
coefficients_knot_cutpoints_special = [0, 19, 32]
coefficients_knot_names_special = [f'{a}-{b}' for a, b in zip(coefficients_knot_cutpoints_special[:-1],
coefficients_knot_cutpoints_special[1:])]
df['coefficient_knot_idx_special'] = -1
df.loc[mask, 'coefficient_knot_idx_special'] = pd.cut(df.loc[mask, 'day'],
coefficients_knot_cutpoints_special,
labels=False)
# Coords let us refer to dimension by their name, rather than having to remember their shapes
coords = {'exponent_knot_normal': exponent_knot_names,
'exponent_knot_special': exponent_knot_names_special,
'coefficient_knot_normal': coefficient_knot_names,
'coefficient_knot_special': coefficients_knot_names_special}
# Train-test split includes the knot mapping
df_train = df.iloc[:140].copy()
df_test = df.iloc[140:].copy()
with pm.Model(coords=coords) as model:
data_pt = pm.MutableData('input', df_train.data)
targets_pt = pm.MutableData(f'convresponse', df_train.targets)
# Here is the mapping; when we do out-of-sample prediction we'll update this as well
coefficient_knot_idx_normal_pt = pm.MutableData('coefficient_knot_idx_normal', df_train.coefficient_knot_idx_normal)
exponent_knot_idx_normal_pt = pm.MutableData('exponent_knot_idx_normal', df_train.exponent_knot_idx_normal)
coefficient_knot_idx_special_pt = pm.MutableData('coefficient_knot_idx_special',
df_train.coefficient_knot_idx_special)
exponent_knot_idx_special_pt = pm.MutableData('exponent_knot_idx_special', df_train.exponent_knot_idx_special)
# Using coords, draw all the parameters for all knots at once
knot_coef_normal = pm.Normal(f'knot_coef', 0, 1, dims=['coefficient_knot_normal'])
knot_exponent_normal = pm.Gamma(f'knot_exponent', 3, 3, dims=['exponent_knot_normal'])
# Using coords, draw all the parameters for all knots at once
knot_coef_special = pm.Normal(f'knot_coef_special', 0, 3, dims=['coefficient_knot_special'])
knot_exponent_special = pm.Gamma(f'knot_exponent_special', 1, 3, dims=['exponent_knot_special'])
# concatenate the two above in a smart way?
# and fix the mapping accordingly? ->
# assuming we have even more such specials i think there could be a smarter way?
modelledresponse = tt.power(knot_coef_normal[coefficient_knot_idx_normal_pt] * data_pt,
knot_exponent_normal[exponent_knot_idx_normal_pt])
sigma = pm.HalfCauchy(name="sigma", beta=10)
outcome = pm.Normal(name="outcome",
mu=modelledresponse,
sigma=sigma,
observed=targets_pt,
shape=data_pt.shape[0], # Important!
initval={'knot_exponent': [1, 1]}) # I found sometimes the exponents blow up
idata = pm.sample()
with model:
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
pm.set_data({'input': df_test.data, 'knot_idx': df_test.knot_idx})
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)
if __name__ == '__main__':
main()