Thank you so much for the code example. This is great info. I don’t know if that is even anywhere in the documentation. Using the graphviz plot I see what’s going on. I imagine it now as mapping the observed data to the mus. I scoured the internet in an effort to find any explanation for how to use it, to not occupy too much more of your time, but I didn’t find a single example using the indexing method. So I am once again kindly asking for your help.
I found a well illustrated primer on different shape types. And a blog post which explains that shapes in PyMC3 are, unfortunately, one hot mess. It also states:
distribution_shape == batch_shape + event_shape
[observed] data should have ashapethat looks likesample_shape + distribution_shape.
If I understood correctly, for a model where I estimate 6 parameters (e.g. \sigma for n_blocks(3)*n_projections(2)) for each user by drawing from independent univariate distributions (same distribution family), my sample shape is n_trials (per block), batch shape is n_users*n_blocks*n_projections, and event shape = ().
I have the data in long format like you suggested:
user block direction projection
0 1 parallel 7.129121
0 1 parallel 3.414878
0 1 parallel -6.330880
0 1 parallel 0.428940
0 1 parallel -3.650233
... ... ... ...
1 3 orthogonal -2.223337
1 3 orthogonal -1.275683
1 3 orthogonal -0.796934
1 3 orthogonal 2.221719
1 3 orthogonal -5.626795
I could unstack the ‘direction’ if that was of any use, as these are guaranteed to be the same length for each individual block.
I tried different shape settings and I don’t know which one is correct. For example,
with pm.Model() as model3:
sigma = pm.HalfNormal('sigma', 5., shape=12) # 2 users * 6 parameters. batch shape?
obs = pm.Normal('obs', 0.0, sigma[samples['user']], shape=12, observed=samples['projection'])
But it’s hard to identify which parameter (sigma 0-11) belongs to which user/block/direction coordinate.
Whatever else I tried with the indexing and shape argument, I always got this error:
RecursionError: maximum recursion depth exceeded in comparison
If I manage to use it as intended I may write something about how to vectorize a model properly without missing data using your indexing method to ease the struggle for other beginners.