Bayesian Non-negative Matrix Factorization using a Gamma-Poisson model

Hi @iavicenna,

thanks for your help!
I had a look on the non-identifiability issue. Indeed, putting a constraint can help (Kucukelbir et al. used an ordering constraint on one of the gamma matrices https://arxiv.org/pdf/1603.00788.pdf). To simplify the issue, I decided to remove the priors on the latent space first and focus on obtaining the ground truth.
First about the data generation and the model, I use the following code:


import numpy as np
import pandas as pd
from pytensor import tensor as tt
import pymc as pm

def print_callback(approx, losses, i):
    if i % 100 == 0:
        print(f"Iteration: {i}, Loss: {losses[-1]}")

# Define the parameters for the gamma distributions
shape = [(1, 0.2, 0.2),
         (0.2, 1, 0.2),
         (0.2, 0.2, 1)]
scale = 0.5

# Initialize an empty matrix
U = np.zeros((100, 3))

# Fill the matrix
for row in range(100):
    if row < 40:
        for col in range(3):
            U[row, col] = np.random.gamma(shape[0][col], scale)
    elif row < 80:
        for col in range(3):
            U[row, col] = np.random.gamma(shape[1][col], scale)
    else:
        for col in range(3):
            U[row, col] = np.random.gamma(shape[2][col], scale)

# Define parameters for the gamma distribution
shape = 2
scale = 0.5

# Generate the matrix
V = np.random.gamma(shape, scale, (3, 10000))
lambdas = U @ V

y = np.random.poisson(lambdas)
df = pd.DataFrame(y)

value_counts = df.stack().value_counts().head()
print(value_counts/df.size)
print(df.stack().describe())

n_components = 3
alpha_prior = lambdas.mean()
print(alpha_prior)

with pm.Model() as pmf:
    U_model = pm.Gamma('U', alpha=alpha_prior, beta=0.89, shape=(df.shape[0], n_components))
    V_raw_model = pm.Gamma('V_raw', alpha=1, beta=1, shape=(df.shape[1], n_components))

    increased_gamma_matrix = tt.cumsum(V_raw_model, axis=1)
    V_model = pm.Deterministic('V', increased_gamma_matrix)

    R = pm.Poisson('R', mu=pm.math.dot(U_model, V_model.T), observed=df)

with pmf:
    inference = pm.ADVI()
    callback = pm.callbacks.CheckParametersConvergence(tolerance=1e-2)
    approx = pm.fit(n=10_000,
                    method=inference,
                    callbacks=[callback, print_callback])
    trace = approx.sample(10_000 // 2)

U_post = trace.posterior["U"].mean(axis=1)[0].values
V_post = trace.posterior["V"].mean(axis=1)[0].values

reconstruction = U_post @ V_post.T

So, I expect three latent spaces with higher expressions for “3” classes (namely with 40, 40 and 20 obs). The V matrix is unstructured.

My follow-up research will focus on clustering the U_post matrix. Thus, I decided to put the constraint on V to left U untouched. The results are however still not close to the ground truth.
The following shows a snippet of U_post: