Adapting Hawkes process Stan code to PyMC

Thank you very much for the reply. Regarding the spatial matrix, what I’m actually doing (sorry I didn’t added it in my previous post) is using the same specified Gaussian kernel you mentioned as normalisation:

#Gaussian kernel
def gk(x, ls):
    return (0.5*np.pi)*at.exp((-0.5 * x**2)/ls**2)/ls**2

That is the parameter gs I added in my model. I’ll edit my initial question to make that clearer.

I should’ve realised that an upper triangular with negative times was a problem :sweat_smile: , but it never occurred to me to use a mask to turn them to zero. Sadly, the mask method doesn’t work as expected. The Stan code runs in about 2 minutes, but when I implement the mask method I get divergences at about 40% of sampling and then the sampler gets extremely slow:

with pm.Model() as mod:
    lengthscaleS = pm.TruncatedNormal("lengthscaleS", 0, 10, lower=0) #length scale space
    lengthscaleT = pm.TruncatedNormal("lengthscaleT", 0, 10, lower=0) #length scale time
    a = pm.TruncatedNormal("a", 0, 10, lower=0) #excitation parameter
    gs = gk(spaceD, lengthscaleS) #gaussian kernel
    mu0 = pm.TruncatedNormal("mu0", 0, 1, lower=0) 
    mu = pm.TruncatedNormal("mu", 0, 1, lower=0)
    mu_xyt = mu0 + mu*muhat
    lam0 =  (a * lengthscaleT * at.exp(-timeD * lengthscaleT)) * gs
    lam = mu_xyt + (lam0 * addition_mask).sum(axis=-1)
    muSTintegral =  len(xyt) 
    
    def lp(name, lam, a, lengthscaleT, mu, mu0, time, integral):
        l = [at.sum(at.log(lam)) - integral * mu - mu0 * space_window * time_window + 
              a * at.sum(at.exp(-time*lengthscaleT)) - integral]
        return l[0]
    
    y = pm.Potential("y", lp("ylp", lam, a, lengthscaleT, mu, mu0, timeD2, muSTintegral)) 

with mod:
    idata = pm.sample(1000, tune=1000, chains=4, cores=12)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 12 jobs)
NUTS: [lengthscaleS, lengthscaleT, a, mu0, mu]
 |██████-------| 48.14% [3851/8000 10:24<11:13 Sampling 4 chains, 5 divergences]

Over ten minutes seems to indicate that there’s still a problem. Note that I added a parameter called mu_xyt, which is the excitation part of the Hawkes process and is calculated from time and space distances using an Epanechikov kernel. This mu_xyt should be part of the parameter ll (I named it lam0), but I add mu_xty to lam0 after applying the mask and doing the summation, because otherwise the model breaks.

Quite possibly I misunderstood something from your answer or maybe I applied something wrongly.