Hello,
I’m new to PyMC3 and for getting started I tried to implement a Naive Bayes (NB) model as described in this blog post but for version 3.5 of PyMC. Here’s my initial attempt:
import numpy as np
import pymc3 as pm
K = 3 # number of topics
V = 4 # size of vocabulary
# 13 documents with 5 words each
data = np.array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[1, 1, 1, 2, 2],
[1, 1, 1, 1, 2],
[1, 1, 1, 2, 1],
[1, 1, 1, 2, 2],
[2, 2, 2, 3, 3],
[2, 2, 2, 3, 3]])
# number of documents
D = data.shape[0]
# Hyper-parameters
alpha = np.ones(K)
beta = np.ones(V)
with pm.Model() as model:
# Global topic distribution
theta = pm.Dirichlet("theta", a=alpha)
# Word distributions for K topics
phi = pm.Dirichlet("phi", a=beta, shape=(K, V))
# Topic of documents
z = pm.Categorical("z", p=theta, shape=D)
# Words in documents
for i in range(D):
pm.Categorical(f"w_{i}",
p=phi[z[i]],
observed=data[i])
The results (not shown here) look reasonable but I’m curious if this implementation is correct at all and if there’s a better way to implement that with PyMC3, especially the for
loop at the end of the example. I’m aware of more efficient alternatives for estimating the parameters of a NB model but I thought this is a good example for getting started with PyMC3.
Thanks in advance for any help!
Best Regards,
Martin