Parametrize mixed beta regression in non centered way


I am fitting a model to estimate the parameters of the distribution of my observations. My observations are proportions measured in the continuous scale, hence they are bounded in the interval [0, 1]. The values are well distributed around 0.5 with a bell-curve shape. I am also modelling the random effect for each subject. I was wondering if I could improve my model by parametrizing it in a non-centered way. The (centered) model I devised so far is:

y_{likelihood}=Beta(a, b), with a=\mu\cdot\phi, b=(1-\mu)\cdot\phi, and \phi is the precision parameter, which I assume to be common to all observations. Then \mu=logit^{-1}(\mathbf{X}\beta+\mathbf{Z}\gamma), where the design matrix \mathbf{X} has dimension (n_{observations}, 1), i.e. is intercept only, and \mathbf{Z} has dimension (n_{observations}, n_{subjects}).

\beta is the main effect, and I set the prior \beta \sim \mathcal{N}(0,10). \gamma_{s} is the random effect for each subject (the deviation of each subject around the intercept for the population \beta), and I set the prior \gamma_{s} \sim \mathcal{N}(0,\sigma) for s=1...n_{subjects}, with \sigma\sim Half \mathcal{N}(10) common to the all the subjects. The model seems to work fine, I do not get divergences, and the posterior check looks good.

I tried to implement the non-centered version in a naive way with gamma_{offest, s}\sim\mathcal{N}(0, 1) and the deterministic \gamma_s=\gamma_{offset, s}\cdot\sigma (as suggested in Wiecki’s blog post here), but I get really odd values for gamma_{offest, s} (like reeeeaaaaally wide and flat distributions). I feel like this is not the right approach in this case, as I need to take into account the support of the beta distribution, and how the values in my linear regression get squashed by the inverse logit transformation.

Do you have any suggestions?


Your way of modelling it seems quite reasonable to me. If the coefficient \gamma is too flat you can change reduce the scale of the prior of \sigma (a HalfNormal(1.) is likely sufficient).


So, both the centered and non-centered version seem allright? Great. Then I will try again the non-centered version, scaling down the priors. I will let you know!


Ok, I tried to narrow the priors and it works! Thank @junpenglao! I have another question though (disclaimer: may be quite silly). In the non-centered parametrization we have \sigma whose prior is usually set quite wide (e.g., Half\mathcal{N}(10)) and then I see \gamma_{offset} usually set quite narrow (e.g. \mathcal{N}(0, 1)) . \sigma and \gamma_{offset} are then multiplied together. What is the rationale behind this? What if instead we set a narrow prior for \sigma and a wide prior for \gamma_{offset}? Or both priors very wide? Do you have a good reference on this to read?


My personal experience is that scale of the offset often doesnt matter that much, as there are quite a lot of information to infer the mean, but the scale of the \sigma does play a key role, as there is not enough information and the prior becomes quite important.


Ok thank you. I did realize that the prior on \sigma sometimes is tricky to set, and it may lead to bad convergence. If you come across a nice read on the topic, please share it :wink:


To me, The prior can generally only be understood in the context of the likelihood from Andrew Gelman, Daniel Simpson, and Michael Betancourt is a must read. Also the idea of simulating from prior as part of your Bayesian workflow: which is a very intuitive way to check your prior.

More formal text including:
Penalising model component complexity: A principled, practical approach to constructing priors
Constructing Priors that Penalize the Complexity of Gaussian Random Fields