# Sparse observed tensor?

I’m trying to implement Bayesian Q Learning via PyMC3. I’m holding a Q-table where each cell has 2 random variables: mu and sd, each representing the estimated Q-value of a given state-action pair, and sd representing the “certainty” around that value.

The problem I’m having is that each time step offers only 1 state-action-reward example, so we have an observed variable that is a very sparse 3-dimensional tensor: [time x number of actions x number of states]. The code of my dense model looks like this:

``````        with self.model:
Qmus = pm.Normal("Qmus", mu=0., sd=1., shape=[2, self.D])
Qsds = pm.Normal("Qsds", mu=0., sd=1., shape=[2, self.D])

pm.Normal('Qtable', mu=Qmus, sd=np.exp(Qsds), observed=full_tensor)

self.trace = mean_field.sample(5000)
``````

This of course doesn’t work great, because it confuses the model with plenty of 0-reward examples when in fact these state-pair combinations were simply not visited at that time.

What would be a better way to do this? Can we somehow update 1 “cell” of a multidimensional normal variable at a time? Seems like post: Using sparse matrices as observed in DensityDist might be related to my problem, but I’m having a hard time understanding the answer.

I seem to have solved my problem. If anyone else has this kind of problem, here is my solution. I created a custom likelihood function (with I call via a pm.DensityDist), in which I refer to the sparse tensor indices, i.e. value[:, 0] and value[:, 1] and use that to index into my multivariate random variables (i.e. the Q-table cells, in my case) for each sample in the observed data.

``````import theano.tensor as tt
def likelihood(value):
idx0 = tt.cast(value[:, 0], dtype='int8')
idx1 = tt.cast(value[:, 1], dtype='int8')
return pm.Normal.dist(mu=Qmus[idx0, idx1], sd=Qsds[idx0, idx1]).logp(value[:, 2])
``````

The shape of my value tensor is: [number of examples x 3]. The 3-length vector is for: [index into my table rows, index into my table columns, actual value of the table cell]. This allows me to have a sparse tensor where each instance refers to only one particular cell, and therefore to update only that particular value of my multivariate RVs.

1 Like