what I want to do is to calculate complexity of reading a word based on the sequence of phonemes that exist.

So we can leverage CMU dict to read words and respective phonemes in the word. We then want to find the probability of what sequence of phoneme is most probable and if it is that the one most probability we give it the least score (meaning easy to read)

I am using pm.Dirichlet because I think it fits the use case pretty well (correct me if I am wrong here)

Here is the ipynb

I am a bit confused by your model, probably because I am not very familiar with the topic.

Are the `theta`

supposed to influence each other? Or would the model using independent `theta = pm.Beta("theta", alpha_prior, beta_prior, shape=39)`

make more sense? In that case the posterior is just `pm.Beta.dist(alpha_prior+1, beta_prior)`

if your observations are always 1 out of 1 trial. No need to sample at all (assuming this is your entire model of course).

Anyway, did you have a more specific question?

1 Like