# find_MAP giving all zeros for sparse data

Hello!

I’m trying to re-implement probabilistic matrix factorization (e.g. as described in section 2 in this paper). Below is the gist of my code. `tfidf` is a (somewhat) large `scipy.sparse.csr.csr_matrix`, with entries that are positive and no larger than `1.0`. Notice that I’m jumping through hoops with my definition of `R_nonzero` in order to avoid having my likelihood take all the zeros into account (i.e., this is how I implement the I_{ij} described in the paper).

``````rows, columns, entries = scipy.sparse.find(tfidf)

n, m = tfidf.shape
dim = 20
sigma = 0.15
sigma_u = 0.02
sigma_v = 0.02

with pm.Model() as pmf:
U = pm.Normal('U', mu=0, sd=sigma_u, shape=[n, dim])
V = pm.Normal('V', mu=0, sd=sigma_v, shape=[m, dim])
R_nonzero = pm.Normal('R_nonzero',
mu=tt.sum(np.multiply(U[rows, :], V[columns, :]), axis=1),
sd=sigma,
observed=entries)

map_estimate = pm.find_MAP()
``````

The problem is that `map_estimate` now comes up with `U` and `V` to be entirely zero matrices! `find_MAP` also only takes 2 iterations in order to converge, which I find a bit suspicious… is it possible that `find_MAP` is stopping too early, somehow?

1 Like

Hm, actually, the same paper mentions that probabilistic matrix factorization usually sucks for sparse or imbalanced data, and that I should probably try Bayesian matrix factorization. So if there isn’t an obvious bug/suggestion out there, it might just be a bad model I think there are problem with the default optimizer - if I remember correctly in the original notebook another optimizer from scipy was choosen.

The original notebook set `method='L-BFGS-B'`, which is the default (weird, but alright).

I tried it again with `method='Powell'`, and it returns slightly better results - `V` is no longer entirely zero, but `U` still is. The Powell method also does not use any gradient information, so that seems suspicious…

Hi,

I think MAP always takes less time. Secondly, if U and V are coming entirely zero matrices, maybe try setting a different prior such as `pm.HalfNormal` or `pm.Lognormal`. However, I don’t have much idea about probabilistic matrix factorisation. So, these are just my guesses which you can try. speaking of that, I think setting a new initial value might also help. But in general MAP is not very reliable in complex problems.

Do you mean PyMC3’s implementation of `find_MAP`, or just MAP methods in general?

In general, always skeptical using one point in a complex space to represent the whole space.

1 Like