Hi all!
I’m working on a probabilistic model to predict (past) energy consumption time series with hourly frequency in buildings, based on time features and outdoor temperature. A simplified version of the mu
of the model would look like this
mu = a[dayhour, weekday] + btc[daypart] * (outdoor_temp - tbal_c) * ((outdoor_temp - tbal_c) > 0) +
bth[daypart] * (tbal_h - outdoor_temp) * ((tbal_h - outdoor_temp) > 0)
Meaning that I have an intercept that depends on hour of the day and the day of the week, plus a term that depends on outdoor temperature. In the temperature term, I define two balance temperatures, one for cooling (tbal_c) and one for heating (tbal_h), as model parameters.
This means that when the outdoor temperature is higher than the cooling balance temperature ((outdoor_temp - tbal_c) > 0)
, this will result in an increase of the electricity consumption proportional to the temperature difference. Same thing when the temperature is lower than the heating balance temperature ((tbal_h - outdoor_temp) > 0)
Now, I would expect that on certain days of the week (for example weekends), the building might not be occupied, and therefore not have a dependence on the outdoor temperature (the climatization equipment is not functioning). In order to model this, I thought of adding two extra parameters dep_h
and dep_c
, so that this dependence can be automatically detected by the model. These two variables would have a Bernoulli distribution with p=0.5:
dep_h = pm.Bernoulli("dep_h", p=0.5, dims="weekday")
dep_c = pm.Bernoulli("dep_c", p=0.5, dims="weekday")
mu = a[dayhour, weekday] + btc[daypart] * (outdoor_temp - tbal_c) * ((outdoor_temp - tbal_c) > 0) * dep_c[weekday] +
bth[daypart] * (tbal_h - outdoor_temp) * ((tbal_h - outdoor_temp) > 0) * dep_h[weekday]
Now while this works when obtaining the posterior using NUTS, discrete priors are not accepted when using ADVI, therefore in order to emulate a similar behaviour I opted for the following model specification:
dep_h = pm.Uniform("dep_h", lower=0, upper=1, dims="weekday")
dep_c = pm.Uniform("dep_c", lower=0, upper=1, dims="weekday")
mu = a[dayhour, weekday] + btc * (outdoor_temp - tbal_c) * ((outdoor_temp - tbal_c) > 0) * (dep_c[weekday] > 0.5) +
bth * (tbal_h - outdoor_temp) * ((tbal_h - outdoor_temp) > 0) * (dep_h[weekday] > 0.5)
So my questions to you probabilistic programming experts are the following:
1 - Does it make sense to let the model automatically detect when there is temperature dependence in the building through the coefficients dep_h
and dep_c
?
2 - If yes, is there a smarter/more efficient way to include this information when using ADVI than the one I’m currently using (uniform prior between 0 and 1, and detecting dependence when the coefficient is higher than 0.5)? I found a similar question in this post but I could not really understand the ‘marginalisation trick’ the post is referring to, in the case of latent Bernoulli variables.
Sorry for the long post, I tried to keep it as short as possible while still providing all the relevant information.
Any help will be greatly appreciated, thanks in advance!