Hi @iavicenna,
thanks for your help!
I had a look on the non-identifiability issue. Indeed, putting a constraint can help (Kucukelbir et al. used an ordering constraint on one of the gamma matrices https://arxiv.org/pdf/1603.00788.pdf). To simplify the issue, I decided to remove the priors on the latent space first and focus on obtaining the ground truth.
First about the data generation and the model, I use the following code:
import numpy as np
import pandas as pd
from pytensor import tensor as tt
import pymc as pm
def print_callback(approx, losses, i):
if i % 100 == 0:
print(f"Iteration: {i}, Loss: {losses[-1]}")
# Define the parameters for the gamma distributions
shape = [(1, 0.2, 0.2),
(0.2, 1, 0.2),
(0.2, 0.2, 1)]
scale = 0.5
# Initialize an empty matrix
U = np.zeros((100, 3))
# Fill the matrix
for row in range(100):
if row < 40:
for col in range(3):
U[row, col] = np.random.gamma(shape[0][col], scale)
elif row < 80:
for col in range(3):
U[row, col] = np.random.gamma(shape[1][col], scale)
else:
for col in range(3):
U[row, col] = np.random.gamma(shape[2][col], scale)
# Define parameters for the gamma distribution
shape = 2
scale = 0.5
# Generate the matrix
V = np.random.gamma(shape, scale, (3, 10000))
lambdas = U @ V
y = np.random.poisson(lambdas)
df = pd.DataFrame(y)
value_counts = df.stack().value_counts().head()
print(value_counts/df.size)
print(df.stack().describe())
n_components = 3
alpha_prior = lambdas.mean()
print(alpha_prior)
with pm.Model() as pmf:
U_model = pm.Gamma('U', alpha=alpha_prior, beta=0.89, shape=(df.shape[0], n_components))
V_raw_model = pm.Gamma('V_raw', alpha=1, beta=1, shape=(df.shape[1], n_components))
increased_gamma_matrix = tt.cumsum(V_raw_model, axis=1)
V_model = pm.Deterministic('V', increased_gamma_matrix)
R = pm.Poisson('R', mu=pm.math.dot(U_model, V_model.T), observed=df)
with pmf:
inference = pm.ADVI()
callback = pm.callbacks.CheckParametersConvergence(tolerance=1e-2)
approx = pm.fit(n=10_000,
method=inference,
callbacks=[callback, print_callback])
trace = approx.sample(10_000 // 2)
U_post = trace.posterior["U"].mean(axis=1)[0].values
V_post = trace.posterior["V"].mean(axis=1)[0].values
reconstruction = U_post @ V_post.T
So, I expect three latent spaces with higher expressions for “3” classes (namely with 40, 40 and 20 obs). The V matrix is unstructured.
My follow-up research will focus on clustering the U_post matrix. Thus, I decided to put the constraint on V to left U untouched. The results are however still not close to the ground truth.
The following shows a snippet of U_post:
