Hi everyone, I am a new learner to PyMC3, and I have a small question after I read the tutorial of “Getting started with PyMC3”. It is an example about the linear regression.
After the sampling, we can get the mean and the standard derivation of the unknown model parameters by “pm.summary(trace)”. We can also get some plot by “pm.traceplot(trace)”. I am curious about how to know the posterior distribution type of those unknown model parameters (e.g., normal, log-normal or beta). If we can not get the posterior distribution type directly, is there a way that we can export the data to run something like K-S test?
Thanks a lot.
Hello - I am relatively new to this too, but I will try to answer assuming someone else will correct the points I get wrong.
There are certain simple problems where analytical solutions exist and you know the type of the parameter’s posterior distribution. The best examples of this are the classic coin flipping problems using the beta distribution like the example here.
However, this is not generally possible for more complicated problems.
When you use MCMC, there is nothing that gives you the type of distribution the posterior sampling produced. Instead, you would use the same processes used for any data set to assign them to a specific distribution. Look at the data in the posterior (is it continuous, discreet, all positive, only defined between 0 and 1?), and think about what specific distributions would mean for your model. From this, you can pick candidate distributions, see if they fit, and try to understand what that means.
In my limited experience, knowing the exact distribution of a parameter’s posterior does not seem useful. It is more important to understand what the sampled posterior distributions predict for your posterior predictive checks and decide whether that makes sense.
Just to re-iterate, I am new at this too, so take it with a grain of salt
Thanks a lot Adam. The example you attached helped me. Actually I think my question could be simplified as: how I can obtain the sampling data of the unknown model parameters from the posterior distribution. Then I can run some analysis to know more about the posterior distribution besides the mean and SD. The example you attached gave me the answer and it is very simple.
I think the reason why I have this question is because I am not only new to the PyMC3 but also to the python. Anyway, thanks a lot for your help. Really appreciate it.
This is an easier problem!
You probably specified a model with something like:
with pymc3.Model() as model:
m = pymc3.Normal("slope", 0, 5)
b = pymc3.Normal("intercept", 0, 10)
mean = pymc3.Deterministic("mean", m*x+b)
error = pymc3.Exponential(3)
y = pymc3.Normal("likelihood", mean, error, observed=my_data)
trace = pm.sample(1000)
To get a numpy array with the samples from the posterior just use:
trace["slope"]
or
trace["intercept"]
or
trace["error"]
The names you use in the brackets are the names you assigned in the model. They can be anything you decide.
I think going through a reasonable amount of the this book will help a lot. Also, check out the PyMC3 Getting Started
Yeah I realized that it is a very easy question lol. Thanks a lot for your help.
Lots of problems are very easy in hindsight! I remember searching for exactly this not so long ago.
I have been falling into this problem too many several times. There are some nice scripts that “compare” for you different distributions and return which of them had the best fit. In the next link you will find it.