Thank you very much for taking the time again to answer my questions. I try to understand it and why it may be the better approach.
I agree, that’s why I’ve come here.
Too bad I’m still a noob at Bayesian inference.
No. It’s a motoric task measuring 2 end effector positions, thus forming a trial-to-trial point cloud in a cartesian 2D-coordinate system. The demeaned 2D-coordinate vectors get projected onto a vector and its orthogonal, basically just a change of basis. “parallel” and “orthogonal” data are merely the coefficients in \vec{v} = a_\parallel\hat{i} + a_\perp \hat{j}.
Yes, in my first post this was the case. That’s why I modelled mu which then was the parameter I was interested in, since in that case mu resembled the variance of the untransformed projection data. It was also log-transformed to pull in extremes (due to the squares) and stabilize the variance of mu itself (variance of variance) etc. It may have been more complex than it needs to be.
For the non-squared data in my second post, however, the mean is guaranteed to be 0, that’s why I didn’t model mu, only sigma.
By flattening you mean throwing all participants together? I don’t quite get your code example. mu and sigma are of shape n_users, but you only estimate these for 1 user and using the data of all users at the same time? I’m confused.
Since participants may deploy different strategies in the motoric task (assigned to different conditions as between factor and each participant may come up with their own solution), I want to estimate model parameters per participant. I started with having each single model per each participant before I discovered I could vectorize the parameters. Having all models per particpant eliminated missing data. Only after vectorizing parameters instead of having each single model for each participant I was facing the issue of missing data, since participants differ in their trial count per block (1 column per participant and trials being the rows in the observed data).
Yes, I only left out the blocks from my previous examples for brevity, since they weren’t relevant to those simple models, and for trying to understand the basics first. If I understand you correctly, this is the approach where I have 1 uber-model per participant? And for variables I’m interested in (9 differences) I would create deterministic variables,e.g. sigma[2] - sigma[1] for difference of parallel deviation block2-block1, and sigma[1]-sigma[4] for difference between parallel and orthogonal deviation in block 2?
But if I had all blocks in 1 vectorized form (trials as rows, block_<projection> as columns) I’d again have missing data, since the trial count differs between blocks? This can be solved though by having block1, block2, block3 with shape 2 each (parallel & orthogonal).
Having those 9 estimates and their uncertainty would be a good thing. Thanks!
Only, having all 9 estimates isn’t fully satisfying my question, since there are competing theoretical predictions about the sizes of those differences, see above ASCII diagrams. I’d like to know whose theory is most compatible with the observed data. Wouldn’t that involve a secondary step of a model selection with the estimated differences as observed data, including their uncertainty somehow?
The approach I had with several models was to state from the beginning what I’d expect the differences to look like for each theory. Null model would have no differences whatsoever. Main effect projection would state that there is a constant unknown difference (>0) between the projections across blocks. And then more sofisticated models with statements about the differences between individual blocks/projections. I feel like this was the other way around of what you suggested, first getting the estimates ofthose differences and then making statements about them.