Bayesian Non-negative Matrix Factorization using a Gamma-Poisson model

One general point on workflow: I strongly suggest you ditch generating test data with numpy, and instead use your PyMC model directly to do it. It’s cleaner and helps avoid errors that can crop up whenever one has to do the same thing twice (to say nothing of subtle differences in parameterization/broadcasting between pymc and numpy).

Here’s how things could look in your case:

import pymc as pm
import arviz as az
import numpy as np

seed = sum(map(ord, 'Gamma Poisson Matrix'))
rng = np.random.default_rng(seed)

n_users = 100
n_items = 10
n_components = 3
n_obs = 10
coords = {'component': np.arange(n_components, dtype='int'),
          'user': np.arange(n_users, dtype='int'),
          'item': np.arange(n_items, dtype='int'),
          'obs_idx': np.arange(n_obs, dtype='int')}
with pm.Model(coords=coords) as pmf:
    alpha_U = pm.Gamma('alpha_U', alpha=2, beta=1, dims=['component']) 
    alpha_V = pm.Gamma('alpha_V', alpha=2, beta=1, dims=['component'])
    U = pm.Gamma('U', alpha=2, beta=alpha_U, dims=['user', 'component'])
    V = pm.Gamma('V', alpha=2, beta=alpha_V, 
                 dims=['item', 'component'])
    R = pm.Poisson('R', mu = U @ V.cumsum(axis=-1).T, dims=['obs_idx', 'user', 'item'])
    prior = pm.sample_prior_predictive(random_seed=rng)

# Choose a prior draw at random to serve as the ground truth data
random_idx = rng.choice(prior.prior.draw)
test_sample = prior.prior.sel(chain=0, draw=random_idx)

# Use pm.observe to clone your model, inserting the chosen draw as observed data, and attempt to recover the parameters
with pm.observe(pmf, {'R':test_sample.R.values}) as m:
    idata = pm.sample(nuts_sampler='nutpie')

# Make a bunch of diagnostic plots, for example:
az.plot_trace(idata, var_names=[x.name for x in m.free_RVs]);
az.plot_posterior(idata, var_names=['alpha_U'], ref_val=test_sample.alpha_U.values.tolist())
az.plot_posterior(idata, var_names=['alpha_V'], ref_val=test_sample.alpha_V.values.tolist())
az.plot_posterior(idata, var_names=['U'], 
                  coords={'user':0}, 
                  ref_val=test_sample.U.sel(user=0).values.tolist());

As for the actual modeling, the transformation suggested in the paper (and mentioned in the old issue you linked) is pm.distributions.transforms.ordered. If you use it, you also have to set inital values for the distribution using the initvals keyword of pm.sample, or else you can end up with np.nan at the sampler starting point.

I also wasn’t super clear on the structure of your data, if it’s just one giant matrix or if you have a stack of matrices (like over time/experiments), so I added that obs_idx as the batch dim. That could be removed, obviously. I took the dim names from the example in the linked paper’s appendix (they’re obviously not relevant to your model). I recommend you use named dims to help communicate your model to others.

1 Like