How can i sample posteriors from training-data, run posterior predictive on testing data (with complex factor graph)

I see two issues mixed up in the question, one related to indexing and one related to modeling.

For indexing, it seems like you mostly have everything well in hand. Actually all you are doing here is making indicator features for all the different types of dates you want to highlight. I agree the fancy-indexing method takes some getting used to. If you’re more used to the dummy variables approach, you can try some experiments to convince yourself the two are exactly the same. As with everything related to Bayesian statistics, there’s a nice Richard McElreath lecture that goes over what indicator variables are and how they are used. Perhaps it will be useful?

This is essentially feature engineering, and there’s not really any getting away from it, index variables or no.

Where I think you are going wrong is with the -1 assignment to December (day > 0 & day < 32 is not needed, since all days are in this range). This isn’t going to make the model ignore December, it’s going to grab the last element of the parameter vector and give it to December (think about what ndarray[-1] signifies).

If you want December to have a different parameter from all the rest, you can make a offset parameter and add it to the base coefficient. Something like:

(knot_coef_normal[coef_idx_normal] + december_knot_coef * is_december)  * data_pt

Where is_december is a vector of 1’s and 0’s. Here I use multiplication with an indicator vector because you don’t want any effect on the non-december knots, so you can’t really use a lookup (well you could, but you’d have to concatenate a zero to the coefficients and I think that’s more trouble than it’s worth).

This segues into modeling part. You have to decide yourself how things are going to go together. In the above example, I assumed the coefficients combine linearly, so that December is the same as everyone else, plus or minus some offset. You can carry on like this for as many effects as you want. You could also entertain some other functional form, but linear is always the best place to start. Given the code you posted, you might try:

        base_coef = knot_coef_normal[coefficient_knot_idx_normal_pt]+ knot_coef_special[coefficient_knot_idx_special_pt]
        exp_coef = knot_exponent_normal[exponent_knot_idx_normal_pt] + knot_exponent_special[exponent_knot_idx_special_pt]

        modelledresponse = tt.power(base_coef * data_pt, exp_coef)

Where ultimately base_coef and exp_coef could include as many additive terms as you wish.

2 Likes