Probability estimation in pymc3

Hi everyone,
I apologize if this sounds like a beginner question, but I could not find the answer by Googling or searching this forum.
As I known, in sklearn or keras, there is a function named predict_proba, which could return the probability of the sample for each class in the model. I’m wondering if there is any similar function in pymc3? And I’m confused about the meaning of the outcomes of sample_ppc, if I take the mean value(axis=0),is the result what I need?Such as a binary classification problem, after I pass X_test to X_shared, how do I calculate the probablitity of a sample in X_test to be predicted as 0 or 1 ?

Yes. When you are doing binary classification, then the mean value(axis=0) would be the class probability.

Things are a bit more complicated if you are doing multi-class classification using categorical distribution (eg., observed = pm.Categorical('obs', p=p, observed=yt)). In those cases, you need to find a way to sample the node p.

1 Like

Thanks so much for your reply.
I’m also confused about the shape of the test set when use theano.shared.I try to make the test set the same shape of the training set with np.tail function and it works.
X_shared.set_value(np.tile(X_test,(math.ceil(len(y_train)/len(y_test)),1))[:len(y_train),:])
I’m wondering if there is a better way to fix this problem?

This is not necessary - you can divide the training set and testing set just like you usually do in Keras or scikit-learn. Just make sure that in the model block you dont specify the row shape.

For example, say you have a training set with shape (500, 10) and a testing set with shape (100, 10), in the model block, random variable should only have the column size specified (for example (10, 1)), in that case you when you are doing .set_value to predict the testing set it will work without matching the shape of the training set.

Thank you very much for the explanation. I think I understand.:wink:
In my model, I try to fix a multi-task learing problem. As the code following:

beta = pm.MvNormal(‘beta’, mu=np.zeros(num_tasks), chol=L, shape=(num_features,num_tasks))

I specified the row size of beta as well, maybe this caused error. But there are 12 tasks and 24 features, it seems that I do not have another choice~