Partial pooled model with complex categories

BingKong1988 · July 11, 2018, 3:40pm

Hi,

I am trying to do a regression analysis on my data. The data set has multiple observations, each of which is a time series (y~t) that can is modeled theoretically by y = A*(1+B*t)**B. I have tried the following model in pymc3 which works fine,

mu_A ~ Normal(…)
sig_A ~ HalfCauchy(…)
mu_B ~ Normal(…)
sig_B ~ HalfCauchy(…)

A ~ Normal(mu_A, sig_A, n_obs)
B ~ Normal(mu_B, sig_B, n_obs)

sig ~ HalfCauchy(…)
y ~ Normal( A[obs]*(1+B[obs]*t)**B[obs], sig, observed=…)

My question is: the observations have categorical parameters cat_1 and cat_2, for cat_1, each observation can only belong to one category of cat_1 (either cat_1_1 or cat_1_2); for cat_2, each observation can belong to one or more categories (observation #1 belong to cat_2_1, observation #2 belong to cat_2_1 and cat_2_3, observation #3 belong to cat_2_1 and cat_2_2 and cat_2_3). How can I add those categorical parameters in my model? Any suggestion will be greatly appreciated. Thanks!

junpenglao · July 12, 2018, 4:39am

cat_1 should be relatively straight forward, as you can have a categorical predictor of 0 and 1. For cat_2 you do it similarly, but code each category as a feature and have a similar 0 and 1 predictor. Something like:

cat1 = [0, 1, 1, 0, ...]
cat2_1 =  [0, 0, 1, 0, 1, 0, ...]
cat2_2 =  [1, 0, 1, 1, 0, 0, ...]
...

with the linear prediction:

cat1beta = Normal(..., shape=2)
cat2_1beta = Normal(...)
cat2_2beta = Normal(...)
cat2_3beta = Normal(...)
y = A*(1+B*t)**B + cat1beta[cat1] + cat2_1beta*cat2_1 + ...

benyi-mikara · July 12, 2018, 8:04am

I have recently encountered this problem. Here’s how I solved it:

step 1 - use pandas to dummy code your categorical variable:
A = pd.get_dummies(A,prefix='cat1_',columns=['cat_1'],drop_first=True)
make sure you set drop_first = True, as this removes the redundant column.
After this step, you should see a few new columns in your dataframe of [0,1] that correspond to cat_1

Step 2 - select cat_1 matrix:
cat1_col_list = [col for col in data if col.startswith(‘cat1_’)]
cat1_cols = data[cat1_col_list ]
no_cat1 = len(cat1_col_list )

Step 3 - sample with the categorical variable
I would modify @junpenglao 's suggestions slightly to
cat1beta = Normal(…, shape = no_cat1)
y = A*(1+Bt)**B + math.matrix_dot(cat1_cols,cat1beta )

This way, you should be able to handle categorical variables with any number of categories without having to manually write up each category.

@junpenglao can you check if this makes sense? particularly with the use of theano matrix_dot operations.

junpenglao · July 12, 2018, 9:21am

Yep, the two should be equivalent, but using matrix multiplication would be more flexible and easier to extend in most case.

Topic		Replies	Views
Regression model using only categorical variables Questions	9	5942	October 26, 2021
Predicting with Categorical Questions	5	2011	October 29, 2019
Trouble specificying X \| a, b, c, d ~ Categorical( . ) Questions	5	496	March 2, 2019
Combination of bayesian models in pymc3 Questions	9	1957	August 13, 2020
Categorical model with continuous dependent variable Questions	10	2918	February 5, 2018

Partial pooled model with complex categories

Related topics