A newbie question: How should I formulate a classification problem using PyMC3?

Depending on what is your assumption. If your assumption is that each document contains some latent topics and it is the topics that drive the favourable/ unfavourable labelling, you can first fit an LDA model as in the doc, and use the outputted label as input to a logistic regression.
You can also specify the number of topics as two, that way you are doing a large sparse logistic regression. You can still use a neural net like in the LDA doc as the approximation for inference.

1 Like