Damn… Thank you so much for your reply. It made a lot of things clear.
It feels a bit weird to fit Normal distributions to my priors since there’s no reason to believe that they are normally distributed. I definitely need to find some time to study more the base of all this. ![]()
The choice of predictors is just based on the dataset that I have… I have cleaned many words and symbols from the data but still I’m left with many different words, and when I include bigrams this number goes to ~2k.
Based on your post, I went all the way back to the data cleaning part to try and select these predictors better… The funny part was that it was taking longer to sample when I was using only words than when I included bigrams. (and I remember in one of your first episodes someone mentioned that sometimes when you include a new feature everything “makes sense” to the model and it fits much better).
I have also to look a lot into prior/posterior predictive checks. Hopefully soon I’ll get these concepts.
Thanks again.