I am new here, so my insight might not be the most sensible, but there are some things that I don’t understand about your model, and maybe you explaining them to me could help you understand where the errors are:
- Reading your code or description, it is really not obvious to me what you mean. Maybe you should try expressing it in detailed mathematical terms first? For example, if
ycould only take two possible values: 0 and 1, and you had reasons to think that you can separateXlinearly, then you could describe a logistic regression in the following way. Doing so might help you figure out the indices situation. Plus, I found that it is usually very straightforward to translate a model expressed in these mathematical terms intoPyMC.
\begin{align*}
\mathrm{Intercept} &\sim \mathrm{Normal}(0, 1)\\
\nu_j &\sim \mathrm{Normal}(0,1)\\
p_i &= \mathrm{logit}^{-1}\left(\mathrm{Intercept}+\sum_j{\nu_j X_{ij}}\right)\\
y_i &\sim \mathrm{Bernoulli}(p_i)
\end{align*}
- Is a regression actually adapted for your problem? From the image, it looks like there is only a small number of different possible rows in
X. If this is indeed the case, and you are not planning to try your model on any other possible vector, then I am not sure that a regression is the most adapted. You might want to considerXas categorical. For each category (each possible row ofX), you could then fit an independent multinomial distribution to the values ofythat this category can lead to. - Talking about the values of
y, because I don’t know what they represent, I am not sure if you should treat them as categorical or continuous. But if they really can only take a finite number of discrete values, then it might make more sense to consider them as categorical too. If they are categorical, it seems weird to store them as \log(2d+1), a formula which might be irrelevant to the modeling of the problem. However, if they are actually continuous but just discretized for whatever reason, I don’t think that a multinomial distribution makes sense. - Is there no intercept in your regression model?
- To avoid the shape problems, perhaps you should create independent random variables for each category. It might take longer to sample, and not be as elegant as multidimensional tensors, but it can be a start.
- You said that
ycan take 4 different values, but from the model graph, it seems thatn_categorieswas equal to 3? - I am not sure how you would like your model to be hierarchical, but perhaps you could start with a non-hierarchical model first, and only add the hierarchical dimension when the non-hierarchical version works? This advice is given often by experienced modelers.
Here is what I can say from looking at your post. I apologize if some of my comments don’t make any sense, but I hope that some do and that they can help you. I might be able to help you further if I knew more about the process that you are trying to model.