Yes you should add mu_xyt after summation. Referencing equation 4 from the paper’s appendix 1, we are computing:
λ(s, t) = μ(x, y, t) + θ ∑_{i:t_i<t} ω · exp(−ω(t − t_i))k((x, y), (x_i, y_i))
You can see that the term \mu(x,y,t) is not inside the summation, so we do the two parts separately.
As for sampling, there are a bunch of steps we need to check before we even look at the model. We should make sure that all the data preprocessing steps match, and then check that the masking method is computing the same thing as the naive loop method. I checked their replication code and did the following preprocessing steps:
import pandas as pd
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
raw_df = pd.read_csv('https://raw.githubusercontent.com/flaxter/gunshot-contagion/master/WashingtonDC_Incidents_2006-2013_Raw_Data_2.csv')
df = raw_df.iloc[:, 2:6].copy()
df.columns = ['datetime', 'Type', 'long', 'lat']
df = df[df.lat > 37]
df.datetime = pd.to_datetime(df.datetime, format='%m/%d/%y %H:%M')
holidays = ["07-01","07-02", "07-03", "07-04", "07-05", "07-06", "12-29", "12-30", "12-31", "01-01", "01-02"]
def zero_pad(s, n):
n_zeros = n - len(s)
return '0' * n_zeros + s
def check_for_holiday(x):
day = zero_pad(str(x.day), 2)
month = zero_pad(str(x.month), 2)
date_str = f'{day}-{month}'
return date_str in holidays
# Drop holidays (Appendix 2)
df = df[~df.datetime.apply(check_for_holiday)]
# Sort by dates (forces the mask to be lower-triangular)
df = df.sort_values(by='datetime')
Time = df.datetime.astype('int64')
assert Time.is_monotonic_increasing
# Hours since first observation
Time = (Time - min(Time)) / 1e9 / 60 / 60
# No idea what the CRS for this data is, but EPSG:4326 is used by census shapefiles so it's my guess
df = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.long, df.lat), crs="EPSG:4326")
# EPSG:26985 is projected meteres for the DC area, convert to km and scale
xy = df.to_crs('EPSG:26985').geometry.get_coordinates().apply(lambda x: (x - x.min()) * 1e-4)
One potential problem I noticed doing this exercise was that the scale of time (hours since the first obseration) still covers 7 years, so the units are quite large (median 32907), so it’s overflow city inside the exp function. I’d make this something smaller, like days or weeks, and work out the algebra to convert back to their paper. I don’t know if Stan does anything under the hood to handle large values in exp. These overflows are an easy place to look for divergences.
I missed that the authors make their timeD negative on definition, so I was getting overflows. If everything is correctly negative, you can ignore all these comments.
Anyway, using this data I confirmed that the naive loop and the masking method are computationally equivalent, so a computation error isn’t the problem. In so doing I found that these computations take a long time with numpy. You should try using nutpie to see if you can get any speedup by jit compiling everything to numba.
I will say the masking method is quite computationally wasteful, especially at the scale of this data (we’re computing values for 625,500,765 elements that will be masked away). There might be a better way to do this, but I’d try JIT first to see if compiler magic can solve the problem for you. Using a scan with JAX is my next idea, but it might also be possible to ravel the lower triangles of timeD and spaceD and just work with those vectors by using some clever indexing.
I also would reduce the number of cores you assign to pm.sample to just equal the number of chains; check the most recent post in the pinned FAQ about how multiprocessing works with the sampler. I’m not sure what routines are used to do large elementwise matrix operations, but if they parallelize and dispatch themselves it might be causing problems.
My tl;dr is: re-scale the time variable to help with divergences and try to use nutpie to speed up sampling.