Obtaining predictions from multilevel model trace

Hi all, I’m trying to run a multilevel model to estimate an electricity consumption time series for a building, but I’m having issues in generating the predictions once I obtained the trace.

My goal is to obtain hourly electricity consumption values depending on the moment of the day (daypart) and on the outdoor temperature, so I formulated this partial pooling model where I have an intercept and a temperature slope which are pooled among the different parts of the day.

with pm.Model(coords=coords) as partial_pooling:

    daypart_idx = pm.Data("daypart_idx", daypart, dims="obs_id")
    temperature = pm.Data("temperature", outdoor_temp, dims="obs_id")

    #Hyperpriors:
    a = pm.Normal("a", mu=0.0, sigma=10.0)
    sigma_a = pm.Exponential("sigma_a", 1.0)
    bt = pm.Normal("bt", mu=0.0, sigma=1.0)
    sigma_bt = pm.Exponential("sigma_bt", 0.5)

    #Varying intercepts:
    a_daypart = pm.Normal("a_cluster", mu=a, sigma=sigma_a, dims="Daypart")
    #Varying slopes:
    bt_daypart = pm.Normal("bt_cluster", mu=bt, sigma=sigma_bt, dims="Daypart")

    #Electricity prediction 
    mu = a_daypart[daypart_idx] + bt_daypart[daypart_idx] * temperature
    #Model error:
    sigma = pm.Exponential("sigma", 1.0)
    #Likelihood
    y = pm.Normal("y", mu, sigma=sigma, observed=log_electricity, dims="obs_id")

with partial_pooling:
    partial_pooling_trace = pm.sample(random_seed=RANDOM_SEED)
    partial_pooling_idata = az.from_pymc3(partial_pooling_trace)

My problem is that now I can’t seem to find the way to get from the estimated parameter distributions to the predictions. For example my goal would be to estimate what would the electricity consumption be at a certain hour of the day and given a certain outdoor temperature value.

All the notebook examples I found define a model, then obtain the trace by fitting the model using sampling or some other methods. None of the notebooks explains how to use that trace to obtain predictions. My idea is that this should happen through some kind of sampling from the obtained trace, but I’m not sure if I’m missing something important here.

Thanks a lot in advance for the help!

1 Like

Yes, you do need to sample: from the posterior predictive using pm.sample_posterior_predictive(). Note that you also need to set the value you want to predict for. You can do this using pm.Data container so that during prediction you can change those values. See here for more information: Prior and Posterior Predictive Checks — PyMC3 3.10.0 documentation

2 Likes

Thank you very much for the answer Thomas. I was indeed able to calculate my predictions by using the pm.sample_posterior_predictive() function as described in the notebook you linked.

I’m now trying to calculate HDIs for my predictions using the az.hdi() function, but I’ve been encountering different issues. I would like to be able to estimate an HDI interval for my predictions, and my idea is that in order to obtain that I should calculate HDIs for each of the posterior distributions of the parameters in my model, and then combine them as specified by the model. But I’m not sure if this is the correct way, and I’m also not sure of how to obtain said HDIs, since when I try to run for example

az.hdi(posterior_predictive['a_daypart'])

I get the error:

UserWarning: More chains (4000) than draws (6). Passed array should have shape (chains, draws, *shape)

which I think is related to the fact that my posterior predictive is really just 1 chain with 4000 draws, while 6 is the amount of different ‘dayparts’, but arviz doesn’t get that.

Overall I’m just really confused about how the arviz data structure works and how to calculate HDIs in general and I’d be really grateful if someone could shed some light or point to any resources that can help me!

It looks like ArviZ is just unhappy with the data you put into it. You can reshape the array contained in posterior_predictive['a_daypart'] to fit the indicated shape it expects. I’m guessing that currently, posterior_predictive['a_daypart'].shape is something like (4000, 6), so I think you can reshape it to (1, 4000, 6) by just doing posterior_predictive['a_daypart'][None, :, :]. I hope this helps!

1 Like

Somehow I missed the initial question. ArviZ expects the array data to either have labeled dimensions with two of them being chain and draw or to follow the convention chain, draw, *shape if dimensions are not labeled.

To get the result of sample_posterior_predictive to generate arrays following ArviZ convention, you can use the keep_size argument.

There are also some examples of using hdi with multilevel models, even calculating hdi on groupby objects in the “radon” notebook: A Primer on Bayesian Methods for Multilevel Modeling — PyMC3 3.10.0 documentation (hdi is used several times so I am not linking to any specific section, search hdi within the page to see all the occurences)

1 Like

Many thanks @jhrcook and @OriolAbril for the answers! Only in the last days I managed to find the time to get back to this, and the solutions you suggested indeed allowed me to calculate HDIs.

I’m now facing another problem though, my predictions using the sample_posterior_predictive function seem to be way off the expected target (by an order of magnitude approximately). On the other hand, if instead of using the sample_posterior_predictive function, I take the parameters directly from the trace, the predictions seem to be pretty accurate.

More concretely, if I want to calculate predictions for the model that I described in the original post, the following approach:

with partial_pooling:

	pm.set_data({"daypart":daypart, “temperature”: temp_ts})

	posterior_predictive = pm.sample_posterior_predictive(partial_pooling_trace, var_names=[“a_daypart”, “bt_daypart”])

a_daypart_means_1 = posterior_predictive[‘a_daypart’].mean(0)
bt_daypart_means_1 = posterior_predictive[‘bt_daypart’].mean(0)

predictions_1 = a_daypart_means_1[np.ix_(dayparts)] + bt_daypart_means_1[np.ix_(dayparts)] * temp_ts

Produces predictions that are way off target, while:

a_daypart_means_2 = np.mean(partial_pooling_trace[‘a_daypart’], axis =0)
bt_daypart_means_2 = np.mean(partial_pooling_trace['bt_daypart’], axis =0)

predictions_2 = a_daypart_means_2[np.ix_(dayparts)] + bt_daypart_means_2[np.ix_(dayparts)] * temp_ts

produces pretty accurate predictions. I’m quite sure that the first way is the correct way to proceed, even though I’m not really aware of the difference between the two approaches.

Could someone shed some light on that? Is the second approach causing severe overfitting? Also if anyone could suggest some ideas to improve the predictions obtained with the first approach, that would be amazing.

Thanks a lot again for the great support provided on Pymc discourse!

a_daypart and bt_daypart are variables of the posterior, and should not be samples by posterior predictive, I remember a recent topic on posterior predictive and what it can and can’t do, will try to add later to the conversation unless someone beats me to it.

I also think a topic around an example of the rethinking book where we discussed multiple ways to get posterior predictive samples will help.

2 Likes

Links to the related topics:

This last topic has inside it a link to a notebook which could be quite helpful alone

1 Like

To add onto @OriolAbril, to get parameter estimates, you can just take directly from the trace like you do in the second method. Generally, pm.sample_posterior_predictive() is used to get samples from the model’s prediction of observed data. So in this case, you can just use pm.sample_posterior_predictive(partial_pooling_trace)["y"] to get posterior predictions. Then, if you change the data in the shared variables daypart_idx and temperature, you can get predictions on new/unseen/artifical data. This is great for checking the posterior predictions of your model given a wide variety and combination of input values.

1 Like

Thanks again for your answers @OriolAbril @jhrcook. I carefully read the posts and the notebook mentioned. So, from what I understand, sampling variables of the posterior to obtain predictions does not make sense. Instead, the following methods should provide the correct predictions:

1- sampling the posterior distribution using pm.sample_posterior_predictive(partial_pooling_trace)["y"].

2 - manually extracting the individual parameter means from the trace, like I did in the second approach I previously posted, and then compute each prediction using predictions_2 = a_daypart_means_2[np.ix_(dayparts)] + bt_daypart_means_2[np.ix_(dayparts)] * temp_ts

3- another option might be manually modifying the data to include all the possible independent variable combinations once, as suggested in Oriol’s notebook, but since one of the independent variables I have is a long timeseries of continuous data, I think this option might not work well in my case (seems more oriented to categorical data).

I understand that the first option should be the most ‘correct’? Or is the second one a valid alternative as well?

Thanks again for the enlightening comments and links!