Hello,
I want to model the occupancy by M electric vehicles (EV) of a set of N EV battery charging stations in a parking lot, for which I have historical data on arrival time, departure time and battery charging per EV per day.
I would like to have a model per EV (as the driving habits and the technical specs of the EV battery are each quite specific).
I was thinking about modeling on one side the presence or not of the EV in the parking for a given day (yes/no) and if present, the arrival time (in the morning usually) and the departure time (in the afternoon usually) or the duration (time spent in the parking, with departure time = arrival time + duration).
I haven’t yet used PyMC beyond the turoial. I see Passenger arrival rate partial pooling model that could be similar in spirit.
So my question, given a specific EV history of data, how can I build a model and train it if I assume:
- presence = Categorical
- arrival time = Normal or Uniform
- departure time = Normal or Uniform
- duration = departure time - arrival time
A first sample of code would greatly help me to start !
thanks in advance
My suggestion would be starting with a simple linear model to predict duration. What I would do is:
For the M EV, assuming each with a feature vector with k feature representing the driving habits, technical specs of the EV battery etc, which gives a M*k matrix. This is the input.
The output is the duration and presence. Here I will combine them together so that presence=0 --> duration=0. This is a vector of length M.
Now, say that you record this for 10 days, you will have a repeated measure of 10. The model would look something like:
with pm.Model() as baseline_model:
intercept = pm.Normal(...)
beta = pm.Normal(..., shape=(k, 1))
prediction = intercept + Xinput.dot(beta) # Xinput has shape M*k
sd = pm.HalfNormal(...)
observed = pm.Lognormal(..., mu=tt.exp(prediction), sd=sd, observed=duration) #duration has shape M*10
Notice here duration is modelled using a Lognormal, but a zero-inflated gamma would be more appropriate.
tx @junpenglao for your answer. But I may have not been very clear…
What I have is a sample of my data looks like
car/driver |
day |
arrival |
departure |
kWh |
#1 |
5/03/2018 |
08:12 |
17:30 |
11,90 |
#1 |
7/03/2018 |
07:41 |
17:55 |
6,40 |
#1 |
8/03/2018 |
07:59 |
19:29 |
4,50 |
#1 |
9/03/2018 |
08:03 |
18:39 |
7,30 |
#1 |
12/03/2018 |
08:07 |
18:19 |
13,40 |
#1 |
13/03/2018 |
08:01 |
19:31 |
4,90 |
You can see the car/drive number 1 comes most week days (he did not come on the 6/03/2018). He arrives around 8am and departs betwen 5pm and 7pm and the charging level of the battery (kWh) is between 4 and 7 usually except on mondays where is it more (more use during the week end).
How would I go to fit a model such that:
- some week days, the car/driver is not there (like the 6/03/2018) => guesstimate for p(car is not in parking) ~ 14% (1 day over 8 weekdays observed)
- arrival is normal ==> guesstimate N(08:00, 00:10)
- departure is normal ==> guesstimate N(18:33, 00:49)
- kWh ==> guesstimate N(12.65, 1.06) on Monday, and N(5.78, 1.3) on Tu-Fri
For guesstimates here I just took simple estimate (mean, stdev, …).
I would like to use PyMC to do this estimates and then to enhance the model with more complex relations (like arrival on day D depends on arrival on day D-1, etc).
I guess the simplest thing to try first is model each M car separately, with uniform priors on all the parameters you would like to infer. The model would be more or less what you wrote down above.