How can i sample posteriors from training-data, run posterior predictive on testing data (with complex factor graph)

I have begun implementing the changes you suggested in my original code, but I have encountered a problem that I am struggling to resolve elegantly. I have found a solution mapped up in my brain but I believe there may be a way more efficient approach. Additionally, the indexing of theano tensors, specifically the random variables, is a new concept to me, and I find it challenging to comprehend even though i have read the docs.

To further illustrate the issue, I have modified your provided code and introduced new variables to demonstrate the problem. In this instance, we have random variables that correspond to specific dates. Consequently, we require a method for indexing subsets of the training data and concatenating or appending them to the non-special cases, or alternatively, devising a more sophisticated solution. As the number of special cases increases, the indexing process becomes increasingly complicated.

I would appreciate any guidance on how to tackle this issue, or if possible, a reference to any related work that could provide me with insights on this topic.

thanks in advance.

def main():
    # I increased the timerange because it didn't actually make it to 140 observations
    # in your example. I also renamed "input" to "data" -- a personal preference.

    timerange = pd.date_range(start="2023-01-01", end="2023-12-31", freq='D')
    coefficientknots = {'0-15': 2.0, '15-32': 3.0}
    exponentknots = {'0-10': 0.9, '10-32': 0.3}

    specialdates = True
    if specialdates:
        coefficients_special_date_knots = {'0-24': 6.0, '24-32': 4.0}
        exponent_special_date_knots = {'0-19': 0.95, '19-32': 0.6}
    else:
        coefficients_special_date_knots = {}
        exponent_special_date_knots = {}

    data = generate_input_data(timerange)
    true_values = compute_true_values(timerange, coefficientknots, exponentknots,
                                      coefficients_special_date_knots, exponent_special_date_knots)

    # Gather stuff into a dataframe for organization
    df = pd.DataFrame(np.c_[data, true_values], index=timerange, columns=['data', 'targets'])
    df['day'] = df.index.day

    mask = (df.index.month == 12) & (df.index.day >= 0) & (df.index.day <= 32)
    
    # Create the map by assiging a bin index to each datapoint based on which day it falls into
    coefficients_knot_cutpoints = [0, 15, 32]
    df['coefficient_knot_idx_normal'] = pd.cut(df['day'], coefficients_knot_cutpoints, labels=False)
    df.loc[mask, 'coefficient_knot_idx_normal'] = -1
    
    exponent_knot_cutpoints = [0, 10, 32]
    df['exponent_knot_idx_normal'] = pd.cut(df['day'], exponent_knot_cutpoints, labels=False)
    df.loc[mask, 'exponent_knot_idx_normal'] = -1

    coefficient_knot_names = [f'{a}-{b}' for a, b in zip(coefficients_knot_cutpoints[:-1],
                                                         coefficients_knot_cutpoints[1:])]

    # These names will be used to make the model output look nicer
    exponent_knot_names = [f'{a}-{b}' for a, b in zip(exponent_knot_cutpoints[:-1],
                                                      exponent_knot_cutpoints[1:])]

    exponent_knot_cutpoints_special = [0, 24, 32]
    exponent_knot_names_special = [f'{a}-{b}' for a, b in zip(exponent_knot_cutpoints_special[:-1],
                                                              exponent_knot_cutpoints_special[1:])]
    df['exponent_knot_idx_special'] = -1
    df.loc[mask, 'exponent_knot_idx_special'] = pd.cut(df.loc[mask, 'day'],
                                                       exponent_knot_cutpoints_special,
                                                       labels=False)

    coefficients_knot_cutpoints_special = [0, 19, 32]
    coefficients_knot_names_special = [f'{a}-{b}' for a, b in zip(coefficients_knot_cutpoints_special[:-1],
                                                                  coefficients_knot_cutpoints_special[1:])]
    df['coefficient_knot_idx_special'] = -1
    df.loc[mask, 'coefficient_knot_idx_special'] = pd.cut(df.loc[mask, 'day'],
                                                          coefficients_knot_cutpoints_special,
                                                          labels=False)

    # Coords let us refer to dimension by their name, rather than having to remember their shapes
    coords = {'exponent_knot_normal': exponent_knot_names,
              'exponent_knot_special': exponent_knot_names_special,
              'coefficient_knot_normal': coefficient_knot_names,
              'coefficient_knot_special': coefficients_knot_names_special}

    # Train-test split includes the knot mapping
    df_train = df.iloc[:140].copy()
    df_test = df.iloc[140:].copy()


    with pm.Model(coords=coords) as model:
        data_pt = pm.MutableData('input', df_train.data)
        targets_pt = pm.MutableData(f'convresponse', df_train.targets)

        # Here is the mapping; when we do out-of-sample prediction we'll update this as well
        coefficient_knot_idx_normal_pt = pm.MutableData('coefficient_knot_idx_normal', df_train.coefficient_knot_idx_normal)
        exponent_knot_idx_normal_pt = pm.MutableData('exponent_knot_idx_normal', df_train.exponent_knot_idx_normal)

        coefficient_knot_idx_special_pt = pm.MutableData('coefficient_knot_idx_special',
                                                        df_train.coefficient_knot_idx_special)
        exponent_knot_idx_special_pt = pm.MutableData('exponent_knot_idx_special', df_train.exponent_knot_idx_special)

        # Using coords, draw all the parameters for all knots at once
        knot_coef_normal = pm.Normal(f'knot_coef', 0, 1, dims=['coefficient_knot_normal'])
        knot_exponent_normal = pm.Gamma(f'knot_exponent', 3, 3, dims=['exponent_knot_normal'])

        # Using coords, draw all the parameters for all knots at once
        knot_coef_special = pm.Normal(f'knot_coef_special', 0, 3, dims=['coefficient_knot_special'])
        knot_exponent_special = pm.Gamma(f'knot_exponent_special', 1, 3, dims=['exponent_knot_special'])

        # concatenate the two above in a smart way?
        # and fix the mapping accordingly? ->
        # assuming we have even more such specials i think there could be a smarter way?
        
        modelledresponse = tt.power(knot_coef_normal[coefficient_knot_idx_normal_pt] * data_pt,
                                    knot_exponent_normal[exponent_knot_idx_normal_pt])

        sigma = pm.HalfCauchy(name="sigma", beta=10)
        outcome = pm.Normal(name="outcome",
                            mu=modelledresponse,
                            sigma=sigma,
                            observed=targets_pt,
                            shape=data_pt.shape[0],  # Important!
                            initval={'knot_exponent': [1, 1]})  # I found sometimes the exponents blow up

        idata = pm.sample()

    with model:
        idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
        pm.set_data({'input': df_test.data, 'knot_idx': df_test.knot_idx})
        idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)


if __name__ == '__main__':
    main()