# Hierarchical Multinomial Logit Model (conjoint analysis)

We are using PyMC to model the choice-based conjoint data using a hierarchical multinomial logit model. Our data came from a survey where people chose between two options, where each option had three features. We want to estimate a model with individual level coefficients for each person.

We have a working model. It takes approximately 2 hours with 40 respondents on a MacBook Pro with M1 Max and 64GB memory. Since we are new to PyMC, we aren’t sure if our implementation is efficient. Any suggestions?

The following is our model code:

``````with pm.Model(coords=coords) as hmnl:
# population priors
# K_syn: the number of attributes in an alternative
theta = pm.Normal('theta', mu=0, sigma=1, shape=K_syn)
# LKJ Prior
chol, corr, stds = pm.LKJCholeskyCov(
"chol", n=K_syn, eta=1,
sd_dist=pm.Exponential.dist(1.0),
store_in_trace=False,
compute_corr=True)
# individual prior
beta = pm.MvNormal('beta', mu=theta, chol=chol,
dims='subject_id',
shape=(R_codes.size, K_syn))
# utility
U_left = (X_syn[:, 0, :] * beta[R_idx]).sum(axis=1)
U_right = (X_syn[:, 1, :] * beta[R_idx]).sum(axis=1)
U = pt.math.stack((U_left, U_right), axis=1)
# softmax
U_normed = pt.special.softmax(U, axis=1)
# the expected value of the outcome
y_obs = pm.Categorical('y_obs', p=U_normed, observed=y_syn)
``````

We are testing the model code with the following synthetic data:

``````import numpy as np
import scipy as sp
from itertools import product, combinations
from sklearn.datasets import make_spd_matrix

# ground truth theta's
theta_syn = np.array([0.5, 1, 0.1])
# possible values of each attribute: binary
attribute_vals = np.array([0, 1])
# 3 attributes
K_syn = 3
N_syn = 1000
# Each task has two alternatives
C_syn = 2
# Generate the possible alternatives: pick one of the two values (1/0) from each attribute
alternatives_syn = np.array(list(product(attribute_vals, repeat=K_syn)))
# All possible choice tasks: combination of two distinct alternatives
# Drop those identical choice tasks
for choice_A, choice_B in tasks_syn \
if not np.all(choice_A==choice_B)])
# Generate choice task (two alternatives each)
## generate the index first
size=N_syn, replace=True)
## indexing

## Generate 2 respondents
R_syn = 2
R_arr_syn = np.repeat(np.arange(R_syn), repeats=N_syn//R_syn)
## Now generate the beta values
cov_syn = make_spd_matrix(n_dim=theta_syn.size, random_state=0)
cov_syn = (cov_syn.T * cov_syn)
## Sample individual partworth
beta_syn = np.array(np.random.multivariate_normal(mean=theta_syn, cov=cov_syn, size=R_syn))

# compute utility
U_syn = list()
for i, xi in enumerate(X_syn):
ui_syn = (xi * beta_syn[R_arr_syn[i]]).sum(axis=1)
U_syn.append(ui_syn)
U_syn = np.array(U_syn)
## softmax
P_syn = sp.special.softmax(U_syn, axis=1)
## simulate choices
y_syn_multi = np.array([np.random.multinomial(n=1, pvals=p_syn) for p_syn in P_syn])
y_syn = np.array([yi.argmax() for yi in y_syn_multi])
``````