# Modelling interactions between two categorical variables

I face a problem of interaction between two categorical variables. To better understand my question, let’s say a travel agency models the amount of upgrades purchased
as a function of the flight destination and of whether or not the flight has a connection. Obviously, some regions will have more non-stop flights, while others will have fewer of them.

Here’s a toy dataset

``````data = pd.DataFrame({
'destination_region':      [0, 0, 1, 1, 1, 2, 2, 2, 2],
'has_a_connection_flight': [1, 1, 0, 0, 0, 1, 0, 1, 0],
'sum_of_upgrades'        : [1, 10, 0, 10, 2, 2, 0, 100, 4]
})
n_regions = data.destination_region.nunique()
``````

Now, modelling the sum of upgrades as a function of each of the independent variable is trivial

``````with pm.Model() as destination_model:
mu = pm.Uniform('mu', lower=0.1, upper=10, shape=n_regions)
sigma = pm.Uniform('sd', lower=0.1, upper=10, shape=n_regions)
)
trace_destination = pm.sample(200, tune=50)

with pm.Model() as connection_model:
mu = pm.Uniform('mu', lower=0.1, upper=10, shape=2)
sigma = pm.Uniform('sd', lower=0.1, upper=10, shape=2)
)
``````

However, what approach should I take in order to model the interplay between the variables?

Hi Ivan

You might want to consider the proportion of customers who upgrade rather than the absolute number who do.

Here is one model that treats the connection as a covariate

``````with pm.Model() as combined_model:
mu_destination = pm.Uniform('mu_destination', lower=0.1, upper=10, shape=n_regions)
beta_connection = pm.Uniform('beta_connection', lower=0.1, upper=10)
sigma = pm.Gamma('sd', 2, 1)

mu = mu_destination[data.destination_region] + beta_connection*data.has_a_connection_flight