find_MAP giving all zeros for sparse data


I’m trying to re-implement probabilistic matrix factorization (e.g. as described in section 2 in this paper). Below is the gist of my code. tfidf is a (somewhat) large scipy.sparse.csr.csr_matrix, with entries that are positive and no larger than 1.0. Notice that I’m jumping through hoops with my definition of R_nonzero in order to avoid having my likelihood take all the zeros into account (i.e., this is how I implement the I_{ij} described in the paper).

rows, columns, entries = scipy.sparse.find(tfidf)

n, m = tfidf.shape
dim = 20
sigma = 0.15
sigma_u = 0.02
sigma_v = 0.02

with pm.Model() as pmf:
    U = pm.Normal('U', mu=0, sd=sigma_u, shape=[n, dim])
    V = pm.Normal('V', mu=0, sd=sigma_v, shape=[m, dim])
    R_nonzero = pm.Normal('R_nonzero',
                          mu=tt.sum(np.multiply(U[rows, :], V[columns, :]), axis=1),
    map_estimate = pm.find_MAP()

The problem is that map_estimate now comes up with U and V to be entirely zero matrices! find_MAP also only takes 2 iterations in order to converge, which I find a bit suspicious… is it possible that find_MAP is stopping too early, somehow?

Thanks for your time!

1 Like

Hm, actually, the same paper mentions that probabilistic matrix factorization usually sucks for sparse or imbalanced data, and that I should probably try Bayesian matrix factorization. So if there isn’t an obvious bug/suggestion out there, it might just be a bad model :smile:

I think there are problem with the default optimizer - if I remember correctly in the original notebook another optimizer from scipy was choosen.

The original notebook set method='L-BFGS-B', which is the default (weird, but alright).

I tried it again with method='Powell', and it returns slightly better results - V is no longer entirely zero, but U still is. The Powell method also does not use any gradient information, so that seems suspicious…


I think MAP always takes less time. Secondly, if U and V are coming entirely zero matrices, maybe try setting a different prior such as pm.HalfNormal or pm.Lognormal. However, I don’t have much idea about probabilistic matrix factorisation. So, these are just my guesses which you can try. :thinking:

speaking of that, I think setting a new initial value might also help. But in general MAP is not very reliable in complex problems.

Do you mean PyMC3’s implementation of find_MAP, or just MAP methods in general?

In general, always skeptical using one point in a complex space to represent the whole space.

1 Like