Sampling from a prior distribution using MCMC can often be harder than you expect. In this case, the sampler is faced with the following likelihood function:
Since the standard deviation of the Normal is so small, it essentially has 3 spikes at the three possible means, and a sea of zero everywhere else. Depending on the initial value, the sampler will find the nearest mean and wander around it. It will not be able to make a proposal that crosses the large gulf between the means, and will thus never draw samples from them. This blog post by @colcarroll has some nice illustrations that will be relevant for building intuition about this issue. This animation is also one I really like.
If you crank up the sigma to something like 10,000, you get a likelihood function that is more tractable:
And posterior samples from categories reflect this:
Another way to remedy the situation is to condition the likelihood on some generated data.


