Thank you very much for the reply. Regarding the spatial matrix, what I’m actually doing (sorry I didn’t added it in my previous post) is using the same specified Gaussian kernel you mentioned as normalisation:
#Gaussian kernel
def gk(x, ls):
return (0.5*np.pi)*at.exp((-0.5 * x**2)/ls**2)/ls**2
That is the parameter gs I added in my model. I’ll edit my initial question to make that clearer.
I should’ve realised that an upper triangular with negative times was a problem
, but it never occurred to me to use a mask to turn them to zero. Sadly, the mask method doesn’t work as expected. The Stan code runs in about 2 minutes, but when I implement the mask method I get divergences at about 40% of sampling and then the sampler gets extremely slow:
with pm.Model() as mod:
lengthscaleS = pm.TruncatedNormal("lengthscaleS", 0, 10, lower=0) #length scale space
lengthscaleT = pm.TruncatedNormal("lengthscaleT", 0, 10, lower=0) #length scale time
a = pm.TruncatedNormal("a", 0, 10, lower=0) #excitation parameter
gs = gk(spaceD, lengthscaleS) #gaussian kernel
mu0 = pm.TruncatedNormal("mu0", 0, 1, lower=0)
mu = pm.TruncatedNormal("mu", 0, 1, lower=0)
mu_xyt = mu0 + mu*muhat
lam0 = (a * lengthscaleT * at.exp(-timeD * lengthscaleT)) * gs
lam = mu_xyt + (lam0 * addition_mask).sum(axis=-1)
muSTintegral = len(xyt)
def lp(name, lam, a, lengthscaleT, mu, mu0, time, integral):
l = [at.sum(at.log(lam)) - integral * mu - mu0 * space_window * time_window +
a * at.sum(at.exp(-time*lengthscaleT)) - integral]
return l[0]
y = pm.Potential("y", lp("ylp", lam, a, lengthscaleT, mu, mu0, timeD2, muSTintegral))
with mod:
idata = pm.sample(1000, tune=1000, chains=4, cores=12)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 12 jobs)
NUTS: [lengthscaleS, lengthscaleT, a, mu0, mu]
|██████-------| 48.14% [3851/8000 10:24<11:13 Sampling 4 chains, 5 divergences]
Over ten minutes seems to indicate that there’s still a problem. Note that I added a parameter called mu_xyt, which is the excitation part of the Hawkes process and is calculated from time and space distances using an Epanechikov kernel. This mu_xyt should be part of the parameter ll (I named it lam0), but I add mu_xty to lam0 after applying the mask and doing the summation, because otherwise the model breaks.
Quite possibly I misunderstood something from your answer or maybe I applied something wrongly.