A newbie question: How should I formulate a classification problem using PyMC3?

sam · July 24, 2018, 8:35pm

I have a classification problem
I have a dataset of 1150 docs with variable number of words (from a vocabulary of 100,000) in each. The docs are labeled favorable / unfavorable
Additionally I have 800 docs that are not labeled. and I have to predict the label.
I read through the example below:
http://docs.pymc.io/notebooks/lda-advi-aevb.html

I thought I should specify number of topics as two…but in the example topic is automatically predicted based on the words in the document. In my case at least 1150 documents already have a topic (favorable / unfavorable)

Can someone help me in formulating the problem so that pymc3 / lda / aevb or parts thereof to solving this classification problem.

My apologies for a rudimentary question.

junpenglao · July 25, 2018, 4:33am

Depending on what is your assumption. If your assumption is that each document contains some latent topics and it is the topics that drive the favourable/ unfavourable labelling, you can first fit an LDA model as in the doc, and use the outputted label as input to a logistic regression.
You can also specify the number of topics as two, that way you are doing a large sparse logistic regression. You can still use a neural net like in the LDA doc as the approximation for inference.

colcarroll · July 25, 2018, 7:56am

I have not seen an example of naive bayes with PyMC3, but that might also be a good approach here, right? The advantage (over, say, sklearn) would be having posterior estimates for each word. If you did not care about such estimates, then sklearn might be the right way to go.

sam · July 25, 2018, 12:18pm

@junpenglao & @colcarroll

Thank you both !

mkesin · July 25, 2018, 12:41pm

Here’s an older example, requires some translation to PyMC3

sam · July 31, 2018, 6:31pm

@mkesin,
Thank you

Topic		Replies	Views
Naive Bayes model with PyMC3 Questions	12	4819	December 9, 2018
Classification Problem using Pymc3 Questions	3	1411	December 31, 2021
LDA implementation with pymc3 Questions	21	5813	August 22, 2020
How to return documents assigned to topics? Questions	0	349	February 2, 2021
Supervised Topic Models in PyMC3 Questions	0	398	January 11, 2021

A newbie question: How should I formulate a classification problem using PyMC3?

Related topics