# Naive Bayes model with PyMC3

#1

Hello,

I’m new to PyMC3 and for getting started I tried to implement a Naive Bayes (NB) model as described in this blog post but for version 3.5 of PyMC. Here’s my initial attempt:

``````import numpy as np
import pymc3 as pm

K = 3 # number of topics
V = 4 # size of vocabulary

# 13 documents with 5 words each
data = np.array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[1, 1, 1, 2, 2],
[1, 1, 1, 1, 2],
[1, 1, 1, 2, 1],
[1, 1, 1, 2, 2],
[2, 2, 2, 3, 3],
[2, 2, 2, 3, 3]])

# number of documents
D = data.shape[0]

# Hyper-parameters
alpha = np.ones(K)
beta = np.ones(V)

with pm.Model() as model:
# Global topic distribution
theta = pm.Dirichlet("theta", a=alpha)

# Word distributions for K topics
phi = pm.Dirichlet("phi", a=beta, shape=(K, V))

# Topic of documents
z = pm.Categorical("z", p=theta, shape=D)

# Words in documents
for i in range(D):
pm.Categorical(f"w_{i}",
p=phi[z[i]],
observed=data[i])
``````

The results (not shown here) look reasonable but I’m curious if this implementation is correct at all and if there’s a better way to implement that with PyMC3, especially the `for` loop at the end of the example. I’m aware of more efficient alternatives for estimating the parameters of a NB model but I thought this is a good example for getting started with PyMC3.

Thanks in advance for any help!

Best Regards,
Martin

Multilevel conditional multinomials in MCMC
#2

I think your problem is quite similar to latent dirichlet allocation - you can have a look at this doc.

Otherwise, a general suggestion on mixture model like this kind is not to explicitly model the latent categories (i.e., avoid doing `z = pm.Categorical("z", p=theta, shape=D)`), but rewrite it as a marginalized mixture model instead.

#3

Thanks for your response. I’ve seen the document you mentioned but wanted to have a much simpler example running. The implementation was actually inspired by the LDA example you posted earlier.

Regarding

do you mean something comparable to Marginalized Gaussian Mixture Model but using the general `Mixture` class instead?

#4

I tried using the `Mixture` class following the examples in the source code but got stuck applying it to my example here. Can you please show me how to re-write my example to a marginalized mixture model? I tried several things but had problems matching shapes of variables and observed data.

#5

Yes you do need to be careful to make the shape work… Maybe rewriting the observed likelihood function as a potential or Densitydist is a bit easier, since you can reuse the loglike function from the doc and the LDA post.

#6

To verify my understanding I implemented a marginalized multivariate GMM. First, I generated some data :

``````from sklearn.datasets import make_blobs

K = 5
n_samples = 150

X, y = make_blobs(n_samples=n_samples, centers=K)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.viridis, alpha=.25);
``````

and then fit the following model to data `X` (shape=(150, 2)):

``````cov = np.array([[1, 0],
[0, 1]])

with pm.Model() as model:
w = pm.Dirichlet('w', np.ones(K))

components = []

for i in range(K):
mu = pm.Normal('mu%i'%i, 0, 10, shape=2)
components.append(pm.MvNormal.dist(mu=mu, cov=cov))

xobs = pm.Mixture('x_obs', w, components, observed=X)
``````

This works as expected, no problems here.

The marginalized version of the NB model I initially posted has the same structure (except for variable types):

``````# Definitions of K, V, D, data as in my initial post
components = []

with pm.Model() as model:
theta = pm.Dirichlet("theta", a=alpha)

for i in range(K):
phi = pm.Dirichlet(f"phi_{i}", a=beta)
components.append(pm.Categorical.dist(p=phi))

obs = pm.Mixture('obs', w=theta, comp_dists=components, observed=data)
``````

but in this case I get

``````ValueError: Input dimension mis-match. (input[0].shape[2] = 3, input[1].shape[2] = 5)
``````

which I do not understand. I can get a working model definition when I define a `Mixture` for each document:

``````with pm.Model() as model:
theta = pm.Dirichlet("theta", a=alpha)

for i in range(K):
phi = pm.Dirichlet(f"phi_{i}", a=beta)
components.append(pm.Categorical.dist(p=phi))

for i in range(D):
pm.Mixture(f'obs_{i}', w=theta, comp_dists=components, observed=data[i])
``````

but in this case the sampling results don’t make sense. For example, inferred `theta` looks like

I really would like to understand why

• `obs = pm.Mixture('obs', w=theta, comp_dists=components, observed=data)` leads to a dimension mis-match error
• fitting with the last model definition gives results for `theta` that do not make sense given the data. (With the non-marginalized NB model version from my initial post I get reasonable values for `theta`).

#7

I will try using `DensityDist` later as you suggested but still would like to know why the previous approach using `Categorical` doesn’t work. I mean the example problem is so simple and I really would like to understand why this can’t be solved with built-in distributions/functions without resorting to `DensityDist`.

#8

I will try to have a look soon - a bit busy this week

#9

No rush, thanks a lot, appreciate it!

#10

So here is how I would approach it:
First, I would rewrite the explicit model, so that the observed is a flatten array - it is easier to handle shape wise:

``````# flatten and index data
data_flatten = np.reshape(data, data.shape[0]*data.shape[1])
data_index = np.repeat(np.arange(data.shape[0]), data.shape[1])

with pm.Model() as model:
# Global topic distribution
theta = pm.Dirichlet("theta", a=alpha)

# Word distributions for K topics
phi = pm.Dirichlet("phi", a=beta, shape=(K, V))

# Topic of documents
z = pm.Categorical("z", p=theta, shape=D)

# Words in documents
p = phi[z][data_index]
w = pm.Categorical("w", p=p, observed=data_flatten)
trace = pm.sample(1000, tune=1000)
``````

Now, to turn this into a marginalized model, we need to not do indexing of the `p` in `p = phi[z]`, but keep the dimensions of the latent topics. Computationally, we want the logp to be evaluated for each latent topic (not just for the “true” latent label):

``````with pm.Model() as model_marg:
# Word distributions for K topics
phi = pm.Dirichlet("phi", a=beta, shape=(K, V))

# Topic of documents
z = pm.Dirichlet("z", a=alpha/2., shape=(D, K))

# Global topic distribution
theta = pm.Deterministic("theta", z.mean(axis=0))

# Words in documents
comp_dists = pm.Categorical.dist(phi)
w = pm.Mixture("w",
w=z[data_index, :],
comp_dists=comp_dists,
observed=data_flatten)
trace_marg = pm.sample(1000, tune=1000)
``````

you can check the mixture component shape to make sure that is the case:

``````w.distribution.w.tag.test_value.shape
w.distribution._comp_logp(data_flatten).tag.test_value.shape
``````

Of course, the two model is not identical, and there are inference problem like multi-modality for both of them… Mixture model is difficult unfortunately.

I put everything in a notebook here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/discourse_2314.ipynb

Looping through random variables
Latent Class Analysis Model (Dirichlet / Categorical Implementation Question)
#11

Thanks a lot @junpenglao for taking time writing this up. It helped me a lot!

Regarding inference problems, I found that when using sparse Dirichlet priors i.e.

``````alpha = np.ones(K) * 0.5
beta = np.ones(V) * 0.5
``````

results significantly improve. For example, for the first model, inferred `theta` matches much better the actual global topic distribution:

I couldn’t see that improvement for the marginalized mixture model though.

#12