Model with decisions

This is a rather theoretical question. One thing that is nice in pymc is that you can swap between model (sampling) steps and other steps.

Imagine we have a time series y_t, we want to loop through it and estimate parameters \theta and maximise some loss from making decisions based on the model.

This is some kind of stochastic optimisation, but this is hard because the random part makes it hard to optimise.

Below is some kind of pseudo code. Let’s assume three latent states, which we estimate with a categorical distribution. Based on the state we make a decision and record the outcome (add to loss).

X is a matrix of historic information.

How can I estimate and optimise such a system? Maybe some examples from the literature.

with model:
  theta = prior
  for t in 1:N:
    state = Categ(f(X_t, theta))
    if state == ...: # loop through states
      decision = z
    loss =+ decision * actual(y_t)

I think usually the approach is estimate the posterior distribution of some outcomes, and use that to make optimal decision based on some loss function. @twiecki and @RavinKumar wrote a bit more in https://twiecki.io/blog/2019/01/14/supply_chain/.
I am not sure optimizing the decision loss during the modeling would make sense, as the loss is not directly observed usually - but it would probably works if we also have historical decision and loss to calibrate the model.