I’ve built out the latent space model from my earlier post, but I’m still having sampling issues. The sampling is very very slow and gets slower over time. The first couple hundred draws are about 1s each, the next thousand are ~30s/draw, and now I’m up to 90s/draw. With 4 chains, that’s 6 minutes each on a system where each chain appears to be using 18 cores.
I’ve already reparameterized the hierarchical component. And I’ve tightened the variance on the prior so that it constrains it to draws that are on the right order of magnitude to produce a graph of similar density, which are the two things that I’ve seen recommended, and I’m not sure what to try next.
Also, a version of the model with no hierarchical component or covariates at all, just estimating the latent space, takes 30s/draw which is very slow but less slow. Even more so because it’s also running on a less powerful server (~2cores/chain). Also, it doesn’t converge either, but does have higher acceptance probabilities and did a little better.
I’ve pasted code for a simulated version of the model. It samples faster than the real data, (settling around 10s/draw) but does not converge.
I usually like to ask a more specific question, but at this point I’m at a loss for what to try next, especially since it’s hard to tell when poor convergence is just due to having too few samples (since even a test run with only a couple hundred samples takes around 6 hours). Can anyone help me troubleshoot the model, and maybe narrow down solutions?
I’ve run a handful of short tests, and finished one longer run with the real data (though the longer run has a less optimized version of the model since I started it 72 hours ago). I get the following errors.
There were 433 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.06801044279154617, but should be close to 0.8. Try to increase the number of tuning steps.
There were 999 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.09502623861687749, but should be close to 0.8. Try to increase the number of tuning steps.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
Also, whenever I try to do a traceplot, the kernel crashes. But this may be a coincidence since I usually try to start a second model as soon as one’s finished, so the system resources get tied up before I try to plot the trace.
Any suggestions for next steps would be greatly appreciated.
# Create Data with two individual-level variables, two dyad-level variables, and latent space.
K = 3
n = 300
z = TT._shared(np.random.normal(0,.5,(n,K)))
xp = TT.sum(TT.pow(z,2),axis=1,keepdims=True,)
xp = TT.tile(xp,[1,n])
xp = xp+TT.transpose(xp) - 2* TT.dot(z,TT.transpose(z))
# xp = 1.0/ TT.sqrt(xp + TT.eye(n,n)*100000)
xp = TT.sqrt(xp+ TT.eye(n,n)*.00001)
dist = xp.eval()
b_1_s = -.08
b_2_s = .008
b_1_r = .09
b_2_r = .3
b_d1 = -.5
b_d2 = .4
v1 = np.random.poisson(10,(n,1))
# v1 = np.log(v1)
v2 = np.random.binomial(1, .5, (n,1))
sender = (b_1_s * v1) + (b_2_s * v2)
sender_mat = np.tile(sender,[1,n]).T
receiver = (b_1_r * v1) + (b_2_r * v2)
receiver_mat = np.tile(receiver.T,[n,1])
#dyadic variable, distance between two attributes
d1 = np.abs(np.random.normal(0,1,(n,1)))
s = np.tile(d1,[1,n])
r = np.tile(d1.T,[n,1])
adj_d1 = np.abs(s-r)
np.fill_diagonal(adj_d1,0)
#dyadic variable, shared attribute
adj_d2 = np.random.binomial(1, .2, (n,n))
np.fill_diagonal(adj_d2,0)
adj_d2[np.triu_indices(n)]=0
adj_d2 = np.maximum(adj_d2,adj_d2.T)
dyad = b_d1 * adj_d1 + b_d2 * adj_d2
eta = dyad - dist + sender_mat + receiver_mat
eta = TT._shared(eta)
eta = (1-TT.eye(n,n))*eta
p = TT.nnet.sigmoid(eta).eval()
adj = np.random.binomial(1,p)
np.fill_diagonal(adj,0)
And the model:
with pm.Model() as test_net:
# Latent Space Positions
z = pm.Normal('z',mu=0,sd=10,shape=(n,K),testval=np.random.rand(n, K))
# Matrix of latent distances
latent = TT.sum(TT.pow(z,2),axis=1,keepdims=True)
latent = TT.tile(latent,[1,n])
latent = latent+TT.transpose(latent) - 2* TT.dot(z,TT.transpose(z))
# latent = 1.0/ TT.sqrt(latent + TT.eye(n,n)*10000)
latent = TT.sqrt(latent+ TT.eye(n,n)*.00001)
# Sender effects
b_1_s = pm.Normal('Sender v1', mu=0,sd=100)
b_2_s = pm.Normal('Sender v2', mu=0,sd=100)
sig_send = pm.HalfCauchy('Sender_sigma',beta=2.5)
send_raw = pm.Normal('Sender_raw', mu=0,sd=1, shape=n)
sender = pm.Deterministic('Sender',
b_1_s*x['v1'].values +
b_2_s*x['v2'].values+
sig_send*send_raw)
sender_mat = TT.transpose(TT.tile(sender,[n,1]))
# Receiver effects
b_1_r = pm.Normal('Receiver v1', mu=0,sd=10)
b_2_r = pm.Normal('Receiver v2', mu=0,sd=10)
sig_rec = pm.HalfCauchy('Receiver_sigma',beta=2.5)
rec_raw = pm.Normal('Receiver_raw', mu=0,sd=1, shape=n)
receiver = pm.Deterministic('Receiver',
b_1_r*x['v1'].values +
b_2_r*x['v2'].values+
sig_rec*rec_raw)
receiver_mat = TT.tile(receiver,[n,1])
#Dyad effects
b_d1 = pm.Normal('Attribute Distance', mu=0,sd=10)
b_d2 = pm.Normal('Shared attribute', mu=0,sd=10)
dyad = pm.Deterministic('dyad',
b_d1*adj_d1+
b_d2 * adj_d2)
# Put it all together
eta = dyad - latent + sender_mat + receiver_mat
eta = eta - (TT.eye(n,n)*100)
p=TT.nnet.sigmoid(eta)
y = pm.Bernoulli('x',p,observed=adj)
trace = pm.sample(200,tune=100,chains=2)