Posterior to Prior in Categorical distribution (or encapsulating multiple data sources to categorical analysis)

I’m implementing a pymc3 model to estimate probabilities for certain parameters based on different data samples.
I based my model on the following great blog post:
estimating-probabilities-with-bayesian-modeling-in-python
I’ll simplify things a bit for the sake of this discussion:
Say, I’m using Dirichlet distribution for some parameter with 3 categories: a, b, c:
parameters = pm.Dirichlet('parameters', a=[1,1,1], shape=3)
After that, I’m using Multinomial for introducing the sampled data:
observed_data = pm.Multinomial('observed_data',n=100, p=parameters, shape=3, observed=[50,25,25], [60,20,20])

Finally, I use monte-carlo markov-chains to sample from the posterior in order to estimate it:
trace = pm.sample(draws=10000, chains=2, tune=1000, discard_tuned_samples=True)

My question is, how can I use the trace I receive to use as the prior (alpha values to the Dirichlet distribution) in the next time I run the model?
Alternatively, I would want run on pm.Multinomial on different sizes of data samples. For example, if I have data source with samples of n=100 and different data source with samples of n=200. How can I encapsulate both of them into the model in a correct way?

Thanks a lot,
Amir

Hi Amir,

  • Regarding your first question, I think this notebook will help you. However, I think you’ll have to do some numerical transformations, as the posterior of parameters will contain values between 0 and 1, so maybe you won’t be able to give that raw to the Dirichlet.
  • About your second question: this should work as usual in PyMC3. So something like: pm.Multinomial('observed_data', n=[100, 200], p=parameters, shape=3, observed=[[50, 25, 25], [120, 40, 40]]).
  • Finally, tuning samples are more important for the sampler’s health than draws, so I’d take more of the former than the latter.
    Hope this helps :vulcan_salute:
1 Like