Probability estimation in pymc3

jasmine2018jixun · April 22, 2018, 6:09am

Hi everyone,
I apologize if this sounds like a beginner question, but I could not find the answer by Googling or searching this forum.
As I known, in sklearn or keras, there is a function named predict_proba, which could return the probability of the sample for each class in the model. I’m wondering if there is any similar function in pymc3? And I’m confused about the meaning of the outcomes of sample_ppc, if I take the mean value(axis=0),is the result what I need?Such as a binary classification problem, after I pass X_test to X_shared, how do I calculate the probablitity of a sample in X_test to be predicted as 0 or 1 ?

junpenglao · April 22, 2018, 6:26am

Yes. When you are doing binary classification, then the mean value(axis=0) would be the class probability.

Things are a bit more complicated if you are doing multi-class classification using categorical distribution (eg., observed = pm.Categorical('obs', p=p, observed=yt)). In those cases, you need to find a way to sample the node p.

jasmine2018jixun · April 22, 2018, 6:49am

Thanks so much for your reply.
I’m also confused about the shape of the test set when use theano.shared.I try to make the test set the same shape of the training set with np.tail function and it works.
X_shared.set_value(np.tile(X_test,(math.ceil(len(y_train)/len(y_test)),1))[:len(y_train),:])
I’m wondering if there is a better way to fix this problem?

junpenglao · April 22, 2018, 7:32am

This is not necessary - you can divide the training set and testing set just like you usually do in Keras or scikit-learn. Just make sure that in the model block you dont specify the row shape.

For example, say you have a training set with shape (500, 10) and a testing set with shape (100, 10), in the model block, random variable should only have the column size specified (for example (10, 1)), in that case you when you are doing .set_value to predict the testing set it will work without matching the shape of the training set.

jasmine2018jixun · April 22, 2018, 7:50am

Thank you very much for the explanation. I think I understand.
In my model, I try to fix a multi-task learing problem. As the code following:

beta = pm.MvNormal(‘beta’, mu=np.zeros(num_tasks), chol=L, shape=(num_features,num_tasks))

I specified the row size of beta as well, maybe this caused error. But there are 12 tasks and 24 features, it seems that I do not have another choice~

Topic		Replies	Views
Shared theano in multiple regression Questions	12	1086	February 12, 2019
Strange error with Categorical distribution Questions	3	473	August 15, 2018
Getting multinomial class probabilities during posterior prediction on test Data Questions	4	1549	January 31, 2018
How to predict new values on hold-out data Questions	24	13349	July 22, 2020
Predicting with Categorical Questions	5	2010	October 29, 2019

Probability estimation in pymc3

Related topics