No sampling when transforming random variables with theano using dot product

I am putting together a network latent space model, but can’t get it to sample in pymc3. I’m afraid I’m missing something obvious to do either with using theano transformations or shape parameters, but can’t figure out which. Here is the simplest version of the model that replicates the problem.

N = 30
K = 3
# Make Data
z = TT._shared(np.random.normal(5,1,[N,K]))
xp = TT.dot(z,TT.transpose(z))
lam = xp_.eval()
x_train = np.random.exponential(lam)

#Model
with pm.Model() as simple_model:
    z = pm.Normal('z',mu=0,sd=100,shape=(N,K))
    x = pm.Exponential('x',lam=TT.dot(z,TT.transpose(z)),
                  observed=x_train)

I want to estimate the latent spaces z. No matter what, I always get a matrix of zeros. When I do the transformation with numpy instead of theano, (np.dot(z,np.transpose(z)) I get a dimension mismatch error instead. Thanks in advance for your help.

Note: it’s also possible that I’ve simplified my toy model too much, but the same issue happens when I use the model from this example. http://edwardlib.org/tutorials/latent-space-models

Edit: After more playing, it appears to be adding the dot product that breaks the model, not the shape or transpose. Same result using pm.math.dot.

I think there is a couple of problems here:

  1. the RV z's default values are 0s, which gives an invalid parameter lam to the Exponential.
  2. the model is over specify, with more parameters than observations
  3. the model is unidentifiable with the dot product (-2*-2 is the same as 2*2)

Maybe a better prior would helps, something like:

with pm.Model() as simple_model:
    z = pm.HalfNormal('z', sd=1, shape=(N, K), testval=np.random.rand(N, K))
    x = pm.Exponential('x', lam=tt.dot(z, tt.transpose(z)),
                       observed=x_train)
    trace = pm.sample(init='adapt_diag')
1 Like

Yeah those are all valid things wrong with the model as posted, but that’s because I tried take it down to something that was two lines line and still failed in the same way. The linked valid latent space model also fails in the same way, you can replace the Exponential with a Normal in both creating the data and the model and it fails in the same way, and you can have all the same problems as this one but have the outcome not be the result of a dot product and it will at least sample. It’s not that the above gives the wrong thing, it’s that it makes no guesses at all for z.

So for example, another model that fails:

z_train = np.random.normal(7.2,3,(100,3))
x_train = np.dot(z_train,z_train.T)
#Model
with pm.Model() as simple_model:
    z = pm.Normal('z',mu=0,sd=100,shape=(100,3))
    x= pm.Normal('x',TT.dot(z,TT.transpose(z)),observed=x_train)

And so does the Edward example:

N = 30
K = 3
z = TT._shared(np.random.normal(5,1,[N,K]))
xp = TT.sum(TT.pow(z,2),axis=1,keepdims=True,)
xp = tt.tensor.tile(xp,[1,N])
xp = xp+TT.transpose(xp) - 2* TT.dot(z,TT.transpose(z))
xp = 1.0/ TT.sqrt(xp + TT.eye(N,N)*100000)
lam = xp.eval()
x_train = np.random.poisson(lam,1)
with pm.Model() as elegans_net:
    z = pm.Normal('z',mu=0,sd=200,shape=(N,K))
    xp = TT.sum(TT.pow(z,2),axis=1,keepdims=True,)
    xp = tt.tensor.tile(xp,[1,N])
    xp = xp+TT.transpose(xp) - 2* TT.dot(z,TT.transpose(z))
    xp = 1.0/ TT.sqrt(xp + TT.eye(N,N)*1000000)
    x = pm.Poisson('x',mu=xp,observed=x_train)

I can sample from the model actually - what error are you seeing?

1 Like

Alternatively, you can try using ADVI, which seems to give a pretty sensible local optimal:

with simple_model:
    approx = pm.fit(100000, obj_n_mc=10)

z_approx = simple_model.bijection.rmap(approx.params[0].eval())
simple_model.logp(z_approx)
# ==> array(-10848.99051671)

point = simple_model.test_point
point['z'] = z_train
simple_model.logp(point)
# ==> array(-10847.51765264)

I can’t reproduce my own error after restarting my environment. :tired_face:
Hours of diagnostics later, the proper latent space model runs–first one I tried. SO I can’t reproduce the error. MAP still consistently estimates a z full of zeros, and every draw on a Metropolis trace is still 0, (I often use those for testing as I build models because they’re fast). But NUTS no longer throws the error it did before. I remember it being something about empty slices, but I can’t remember specifically. I’ll post again if it comes back.

I’ll also post a notebook with some version of the Krivitsky, Handcock, Raftery, & Hoff, (2009) latent space network model. I haven’t seen any pymc3 latent space examples, so hopefully it’ll be useful to someone else.

But thanks for your help, sorry it turned out to be nothing. For my part, at least I understand a lot more about Theano variables after all this.

1 Like

Okay, I reproduced the error. This is strange but at least I feel less negligent. When I explicitly set step=pm.NUTS() I get the following error:

/home/law98/anaconda3/envs/theano_env/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2957: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)

Bad initial energy, check any log probabilities that are inf or -inf, nan or very small:
x   NaN

When I do not, but leave everything else the same, it states

Auto-assigning NUTS sampler...

And then successfully samples.

I used just a few draws for testing since it’s slow, but in both cases the sample command is as follows:

trace = pm.sample(20,tune=20,step=pm.NUTS(),chains=2)

EDIT:
When you don’t specify nuts, it prints “Initializing NUTS using jitter+adapt_diag…”
BUT when you use:
pm.sample(20,tune=20,step=pm.NUTS(),init='jitter+adapt_diag',chains=2)
you get the same bad initial energy error. So it isn’t just initialization that is different.
I would have expected them to have the same default behavior. Is there a way to tell what else is different?

Yeah this is not very clear from the doc but passing step=... to pm.sample() inhibits the init=..., and as the default value of z is all 0 that’s the reason for Bad initial energy... error. The above code I posted with non-zero testval should works.

2 Likes

Thanks, that clears it up, and will hopefully help with sampling issues in the future.