Estimate dynamic discrete choice model

I want to use Pymc to estimate the parameters of Dynamic Discrete Choice Model. The model consists of states and action, and the utility function is defined as

1681270044966
where s is state and p is action.
Likelihood Function is defined as


And the measure of model fit in Bayesian statistics is the marginal density of the
data according to the model, defined as
1681270193787
can I use pymc to estimate the model?

If I’m reading the setup right, the challenge I see is that you need to know the value function by solving (3) every time you change the parameters. Do you have a closed form solution for it? Otherwise you will need to do VFI (or similar) on each parameter draw. PyMC can do this, but not in a way that would give you gradients (i.e. you lose NUTS).

Once you have V_a(s | \theta_i), though, PyMC could do it no problem. The solution would be a table of size (states, actions), and (5) is just a softmax over the action dimension. Then you can use a pm.Potential to insert the P(\text{data} | \theta) term into the likelihood. PyMC would handle the prior part for you.

Thanks! So what is VFI?

So VFI is Value Function Iteration? do you now how to implement that in pymc?

Yes, VFI is value function iteration. There is a very nice collection of lectures about numerical solutions to Bellman equations on QuantEcon, starting from lecture 43 until lecture 48.

Once you had a function to implement VFI, you would wrap it in a Pytensor Op, and call it within your model, similar to this example. In the example, the author implements a likelihood function puts it into a pm.Potential, but you can follow the exact same procedure to run arbitrary python code inside your PyMC model.

Thanks a lot!

do you know how to write the softmax likelihood function?

Use pm.Categorical as the likelihood. The softmax computes the class (choice) probabilities, it’s pm.math.softmax.

Thanks! since there are may states=, should I write multiple pm.Categorical?

No, you should gather everything together. From the equations it looks like you are trying to infer actions? So the states should just be integer indexes, and your data should be a matrix with dimensions (observation, time), is that right?

yes

I am also trying to use pymc to estimate a Dynamic Discrete Choice Model and followed the above suggestions. However, sampling takes very long, therefore I am afraid to be doing something wrong.

My model is wrapped in a class that can provide log-likelihood (sum and individual contributions => currently only using sum) as well as gradients for a given set of parameters. One likelihood evaluation takes approximately 9ms and model estimation with estimagic (algorithm: scipy_lbfgsb) takes approx. 10s.

Now I tried to implement the suggested custom likelihood function and pm.sample is expected to run 10+ hours. I would highly appreciate any suggestions about what I may be doing wrong. Here is my code:

class LogLike(pt.Op):
    itypes = [pt.dvector]  # expects a vector of parameter values when called
    otypes = [pt.dscalar]  # outputs a single scalar value (the log likelihood)

    def __init__(self, model):
        self.model = model
        self.ll_grad = LogLikeGrad(model)

    def perform(self, node, inputs, outputs):
        (theta,) = inputs
        logl = self.model.get_ll(theta)
        outputs[0][0] = np.array(logl)

    def grad(self, inputs, g):
        (theta,) = inputs
        # not sure if I should use g for anything here?
        return [self.ll_grad(theta)]

class LogLikeGrad(pt.Op):

    """
    This Op will be called with a vector of values and also return a vector of
    values - the gradients in each dimension.
    """

    itypes = [pt.dvector]
    otypes = [pt.dvector]

    def __init__(self, model):

        # add inputs as class attributes
        self.model = model

    def perform(self, node, inputs, outputs):
        (theta,) = inputs

        # calculate gradients
        grads = self.model.get_grad(theta)
        outputs[0][0] = grads

logl = LogLike(hm) # hm is a model class that already contains the data and methods to compute likelihood and gradients

with pm.Model():
    # priors are centered around maximum likelihood values
    alpha = pm.Normal("alpha", mu=-3, sigma=2.0)
    b1 = pm.Normal("b1", mu=-5.0, sigma=2.0)
    b2 = pm.Normal("b2", mu=1.0, sigma=2.0)

    theta = pt.as_tensor_variable([alpha, b1, b2])

    pm.Potential("likelihood", logl(theta))

    idata_grad = pm.sample()

Because it seems to simulate a Hamiltonian system, the NUTS algorithm does many, many gradient evaluations to generate a single sample. If you have a computationally expensive likelihood (and gradient), it will be quite slow.

I don’t know anything about the model object, but is it a numba jitclass? If not, can it be? To me, this is the first obvious step. It might not be possible to jit the entire class. If not, can you just jit the likelihood function and gradient, then pass in the required member variables as arguments to LogLike.__init__? (I like numba, but you could also jit the function using jax then link it up to PyMC by following the procedure in this blog)

@aseyboldt is the expert on gradients, but I think you need to return g * self.ll_grad(theta), because actually grad should return the vector-jacobian product (the “push forward” in differential geometry), not simply the gradient. But I could be wrong here.

Thank you for the prompt reply! I realized that there are way more likelihood evaluations than should be necessary to estimate the parameters (the scipy_lbfgsb algorithm uses 100 evaluations of the likelihood and gradient to estimate the parameters). I am planning to use the Bayesian framework to estimate varying coefficients, so my final model will have many parameters to be estimated. Therefore, I hope to get a very efficient implementation that runs fast for this simple base case with only 3 parameters.

My model class is already a jitclass and it implements a dynamic logit model with VFI (it only takes ~10ms per evaluation, which I believe to be quite fast given the complexity of VFI). Would it make any difference if I would return the vector of choice probabilities and gradients and use pm.Bernoulli as the likelihood? My goal is to avoid obsolete likelihood/gradient evaluations.

Thank you for your support!

I didn’t mean that there are more evaluations than necessary, just that there can be hundreds per sample. This can add up quick, especially if the posterior geometry is complex/degenerate. As an aside, if the gradients are wrong, it will result in bad sampling, so it’s important to get that right. There is a function in pytensor to verify your gradients against approximate finite difference method, it’s pytensor.gradient.verify_grad

I do agree 10ms is fast, but you can still go faster. If you can adapt it to your problem, the endogenous grid method is orders of magnitude faster than VFI. Time iteration is also faster than VFI, but slower than EG. These require specific problem setups, though.

I guess one way to check if the computational bottleneck is in repeated gradient evaluations would be to run the model using a gradient-free method, for example smc. You could also check it can recover known parameters from simulated data.

1 Like