Trouble fitting Bayesian GLM with large number of parameters

I am having trouble getting a bayesian GLM with a large number of parameters to converge even though it follows the data generating process.

The data is generated according to the following (simulated) process. There are a number of user who give scores across a number of weeks. A users score for a given week is sampled from a normal where the mean is given by the sum of a user specific term, a week specific term and a given treatment effect (user specific) thats scaled by a change factor that grows across weeks. This normal has constant variance across all users and weeks. The user and week specific terms are simply selected from normal distribution. The treatment effect is the product of global user weights and user specific features. A treatment effect sampled from a normal for each user is used to scale the weights prior to this operation to achieve the desired user specific treatment effect.

Here is python like pseudocode to generate the data according to the above procedure:

num_users = 10000
num_weeks = 12
user_terms = Normal(user_mean, user_std, size=num_users) 
week_terms = Normal(0, week_std, size=num_weeks)
treatment_effects = Normal(effect_size, effect_size / 2)
global_user_weights = Laplace(weights_mean, weights_decay, size=num_features) 

# Normalize global user weights to unit norm 1
scaled_global_user_weights = global_user_weights / sum(global_user_weights) 

score_matrix = ones(rows=num_users, cols=num_weeks)
for user_term, user_id in enumerate(user_terms): 
    user_features = choice([0, 1], size=num_features) # binary features 
    for week_term, week_id in enumerate(week_terms): 
        change_factor = log(week_id * (velocity - 1) / (num_weeks - 1) + 1) / log(velocity)
        # Scale global user weights to induce a maximum treatment_effect of effect_size 
        # when thew dot product is taken with user specific features
        weights = scaled_global_user_weights * treatment_effects[user_id] 
        treatment_effect = dot(weights, user_features) * change_factor) 
        treatment_effect =  treatment_effects[user_id] * change_factor
        score_mean = user_term + weekly_term + treatment_effect
        score = Normal(score_mean, score_std) 
        score_matrix[user_id, week_id] = score

We also have a more complicated model where the treatment effect is determined by the dot product of global user weights with user features plus the dot product of user type specific weights with user features plus a user type specific intercept term. Kmeans is used to determine cluster membership externally from the bayesian model given a user embedding. The cluster membership determines the set of user_type specific weights a given user has. For the sake of readability, I will omit the pseudocode but it follows the same structure above except for the modification to how the treatment_effect is computed.

Given the users features, week and scores (per user per week), we are trying to recover the global user weights (plus user type specific weights and user type specific intercept terms in the more complicated model) along with the user and week specific terms. Since we know how the data is generated we set extremely informative priors and have the structure of the model exactly mimic the data generating process in hopes of being able to recover the parameters.

We are exploring having 10,000+ users, 12 weeks and a global user weight dimension of 10 (and user type weight dimension of 10 x number of user types and 10 per user type intercepts in the more complicated model. The number of user types we have now is 10). Thus, the number of parameters grows to be large. Consequently, the model has poor convergence diagnostics and is ultimately unable able to recover many of ground truth weights especially in the case of the more complicated model.

I am looking for any general tips to fit a model of this size. Or maybe this is completely unrealistic, in which case I would like to know what is the upper bound on the number of parameters you can conceivably have in a bayesian glm, or heuristics to derive this estimate based on features of the data and model.

I would be happy to provide code if needed, so let me know!

Have you tried fitting a simpler version of the model? Perhaps trying 1 week, or a smaller weight dimension? Or hard coding the week specific/treatment effect terms rather than generating them? Maybe try to simulate 10 000 people with all the same qualities?

That’s usually illuminating for my problems. Sounds like there are lots of moving parts in your machine which could all go wrong in some way.

My understanding is that, in the Bayesian sense, data and parameters are the same thing so there shouldn’t be an upper bound.

If the priors are extremely informative, have you tried sampling from them to see if the results look sensible?

1 Like

Sorry for the really delayed response but the issue was resolved by simplifying the model and I realized that it was misspecified. Thanks taking the time to respond to my query :slight_smile:

2 Likes