Hi all,
I aim trying to develop a Bayesian NMF model to decompose genetic data (namely so called SNP marker data). The data is special in that case that it just contains 0, 1 or 2 values. I want to employ a Gamma-Poisson model with automatic relevance determination (ARD).
My current version is as follows:
# X has dimension nxfeatures
mu = np.mean(X)
alpha_prior = mu**(1/2) # based on the assumption that mean_gamma_U * mean_gamma_V -> mean_poisson
with pm.Model() as pmf:
alpha_U = pm.Gamma('alpha_U', alpha=1, beta=1, shape=n_components) # n_components represents the targeted latent space e.g. 5.
alpha_V = pm.Gamma('alpha_V', alpha=1, beta=1, shape=n_components)
U = pm.Gamma('U', alpha=alpha_prior, beta=alpha_U, shape=(X.shape[0], n_components))
V = pm.Gamma('V', alpha=alpha_prior, beta=alpha_V, shape=(X.shape[1], n_components))
R = pm.Poisson('R', mu=pm.math.dot(U, V.T), observed=self.X)
alpha_U and alpha_V should be regulate the ARD process. However, I observe very similar values for each posterior alpha_U and alpha_V respectively, independent of the data. I created some synthetic data based on gamma distributions with latent space 3 but I still I obtain similar values for each alpha_U/_V even if I use latent space 5 for the model fit. On the other hand, I obtain with scikit-learns NMF reasonable decompositions. I tried as well to use alpha_U for both U and V without any improvement.
Can anyone help me with this or made similar experiences??
Thank you all in advance!
Best,
HStoneCreek