Inconsistencies w/ R in Discrete Choice and Random Utility PyMC example?

jtl · February 4, 2025, 3:28pm

Greetings,

I am relatively new to pymc and am trying to construct a discrete choice model of heating system adoption. I found this very relevant and helpful example. However, I am noting that the parameter estimates in this example end up being quite different than those for the R mlogit package (here), which uses the same dataset.

In particular, the Basic Model in the pymc example generates positive coefficient values for the installed cost predictor (beta_ic=0.002, sd=0 in pymc vs. beta_ic=-0.00623187, stderror=0.00035277 in the analogous R result.) Even when improving the model specification in the Improved Model, the positive coefficient value remains at close to zero (while continuing to be negative in the mlogit example). The author attempts to justify the result but I suspect something is wrong with the model specification because it doesn’t make sense for increased equipment installed cost to translate into increased probability of adoption (and again, this is not the case in the mlogit model).

Can anyone help diagnose what is going on, and whether this model might be misspecified? (I noted another thread in which model misspecification resulted in similar issues, but I have been unable to translate the fix to this case.)

Thank you in advance!

cluhmann · February 4, 2025, 3:31pm

Welcome! Perhaps @Nathaniel_Forde can help as he developed this notebook.

Nathaniel_Forde · February 5, 2025, 8:19am

Hi JTL,

Thanks for the feedback on the notebook. You’re correct that the result is in the early models are in contradiction with theory. The idea of the notebook was to show escalating complexity in the models and arrive at the point in the final model where you use priors to constrain the estimates more effectively.

I suspect you could probably improve things for the earlier models by either having tighter priors on the beta parameter. I was using Normal(0, 1) is this context and that seems quite wide in retrospect.

Or additionally you could have a negatively constrained prior as i show in the last model where you force the price effect to be realised as a negative value.

Finally for a Bayesian model I generally would focus more on the range of the posterior than a comparison of point estimates with R. The estimation methods are different and the Bayesian fit is incorporating uncertainty i’ve embedded the prior setting. So for resolving i’d play around with how you set the priors on the price coefficient and perform a kind of sensitivity analysis.

Nathaniel_Forde · March 21, 2025, 10:31pm

Actually, on second thought i think there is an error in the model specification that needs a fix.

The issue relates to the zero’d out data for the hp alternative. Model identification requires we zero out parameters that make it hard for the model to sample, but since the first model here only has fixed beta coefficients for each of the covariates that is not an issue and we can include the hp alternative.

N = wide_heating_df.shape[0]
observed = pd.Categorical(wide_heating_df["depvar"]).codes
coords = {
    "alts_probs": ["ec", "er", "gc", "gr", "hp"],
    "obs": range(N),
}

with pm.Model(coords=coords) as model_1:
    beta_ic = pm.Normal("beta_ic", 0, 1)
    beta_oc = pm.Normal("beta_oc", 0, 1)

    ## Construct Utility matrix and Pivot
    u0 = beta_ic * wide_heating_df["ic.ec"] + beta_oc * wide_heating_df["oc.ec"]
    u1 = beta_ic * wide_heating_df["ic.er"] + beta_oc * wide_heating_df["oc.er"]
    u2 = beta_ic * wide_heating_df["ic.gc"] + beta_oc * wide_heating_df["oc.gc"]
    u3 = beta_ic * wide_heating_df["ic.gr"] + beta_oc * wide_heating_df["oc.gr"]
    u4 = beta_ic * wide_heating_df["ic.hp"] + beta_oc * wide_heating_df["oc.hp"]
    s = pm.math.stack([u0, u1, u2, u3, u4]).T

    ## Apply Softmax Transform
    p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))

    ## Likelihood
    choice_obs = pm.Categorical("y_cat", p=p_, observed=observed, dims="obs")

    idata_m1 = pm.sample_prior_predictive()
    idata_m1.extend(
        pm.sample(nuts_sampler="numpyro", idata_kwargs={"log_likelihood": True}, random_seed=101)
    )
    idata_m1.extend(pm.sample_posterior_predictive(idata_m1))

pm.model_to_graphviz(model_1)

This gives us:

Then when we get to the expanded model with the alternative specific intercepts, then we have to zero out the problematic matter parameters which is in this case just the alternative specific intercept for the hp alternative

N = wide_heating_df.shape[0]
observed = pd.Categorical(wide_heating_df["depvar"]).codes
coords = {
    "alts_probs": ["ec", "er", "gc", "gr", "hp"],
    "alts_intercepts": ["ec", "er", "gc", "gr"],
    "obs": range(N),
}

with pm.Model(coords=coords) as model_1:
    beta_ic = pm.Normal("beta_ic", 0, 1)
    beta_oc = pm.Normal("beta_oc", 0, 1)
    alphas = pm.Normal("alpha", 0, 1, dims="alts_intercepts")

    ## Construct Utility matrix and Pivot
    u0 = alphas[0] + beta_ic * wide_heating_df["ic.ec"] + beta_oc * wide_heating_df["oc.ec"]
    u1 = alphas[1] + beta_ic * wide_heating_df["ic.er"] + beta_oc * wide_heating_df["oc.er"]
    u2 = alphas[2] + beta_ic * wide_heating_df["ic.gc"] + beta_oc * wide_heating_df["oc.gc"]
    u3 = alphas[3] + beta_ic * wide_heating_df["ic.gr"] + beta_oc * wide_heating_df["oc.gr"]
    u4 =  0 + beta_ic * wide_heating_df["ic.hp"] + beta_oc * wide_heating_df["oc.hp"]
    s = pm.math.stack([u0, u1, u2, u3, u4]).T

    ## Apply Softmax Transform
    p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))

    ## Likelihood
    choice_obs = pm.Categorical("y_cat", p=p_, observed=observed, dims="obs")

    idata_m1 = pm.sample_prior_predictive()
    idata_m1.extend(
        pm.sample(nuts_sampler="numpyro", idata_kwargs={"log_likelihood": True}, random_seed=101)
    )
    idata_m1.extend(pm.sample_posterior_predictive(idata_m1))

pm.model_to_graphviz(model_1)

which gives us an estimate more in line with mlogit as you report

Thanks so much for flagging this! I’ll make a PR to have it fixed in the example notebook

jtl · March 24, 2025, 1:38pm

@Nathaniel_Forde thanks so much for the follow up, and my apologies for not responding to your first message – I was going to ask about why an uninformative prior wouldn’t end up producing similar estimates to the mlogit example, but then ran into a configuration issue with pymc after updating my OS and hadn’t had a chance to rerun the example. I’ll work through your updated approach here and try to reproduce your results. Thanks again!

PS: I suspect you’ve already made the revisions to the example b/c the results for the first model look similar to what you have above. Just noting that the subsequent paragraph still refers to a positive installed cost coefficient:

Our parameter estimates differ slightly from the reported estimates, but we agree the model is inadequate. We will show a number of Bayesian model checks to demonstrate this fact, but the main call out is that the parameter values for installation costs should probably be negative. It’s counter-intuitive that a βic increase in price would increase the utility of generated by the installation even marginally as here. Although we might imagine that some kind of quality assurance comes with price which drives satisfaction with higher installation costs. The coefficient for repeat operating costs is negative as expected. Putting this issue aside for now, we’ll show below how we can incorporate prior knowledge to adjust for this kind of conflicts with theory.

Nathaniel_Forde · March 24, 2025, 1:42pm

Thanks JTL, yeah updated the code and missed that reference. Code should be good now however. Thanks again for flagging it.

bob-carpenter · April 4, 2025, 8:51pm

Could it be something as simple as the R package gives you a maximum likelihood estimate and PyMC is giving you a Bayesian estimate? These can be off by much more than the example you showed.

How do you get sd = 0 in a PyMC model? Is that just rounding to zero?

This is easy to see with a simple Bernoulli model. Suppose I have observations y = [1 \ 0 \ 1 \ 0 \ 0] and a model with likelihood y \sim \text{Bernoulli}(\theta) and prior \theta \sim \text{beta}(1, 1) (i.e., uniform). The MLE is going to be \theta^* = 0.4. The Bayesian posterior is \text{beta}(3, 4), with the mean as standard Bayesian estimator, \widehat{\theta} = \frac{3}{3 + 4} \approx 0.428. So you get different estimates. As the size of y goes to infinity, the estimates will converge.

The Bayesian estimate is nice because if the model is correct, it minimizes expected square error of the estimate. All we know about the MLE is that it makes the data most likely and converges to the right answer asymptotically (as does the Bayesian estimate).

Topic		Replies	Views
Issue getting started with simple example Questions	1	716	November 18, 2018
Interpretation of posterior predictive checks for a Gaussian Process v5 gaussian_process , modeling , arviz	11	1153	August 24, 2023
Problem of fitting expected utility model in risky choice v5 modeling	1	24	September 24, 2024
Real-life example on Housing price regression: advice requested Questions	5	2658	April 3, 2019
Beyond linear regression with pymc v5 modeling	11	650	October 29, 2022

Inconsistencies w/ R in Discrete Choice and Random Utility PyMC example?

Related topics