Adding Covariates to log-Gaussian Cox process

I’m working to model a spatial distribution. It’s a repeated process where events occur in X-Y plane, and I’d like to model the spatial distribution of those events. I think this would be a good use case for a log-Gaussian Cox process (with a latent Gaussian Process and a Poisson likelihood), as shown in the pymc tutorial on it.

I’m able to fit the model to the data and it looks fine, however, I’d like to use some predictors to inform the model, so the resulting pdf can change based on inputs (e.g. under conditions A, we might expect higher counts in certain X-Y bins). However, since the process requires binning the data, it’s not clear to me how to add covariates to the formulation of this model. Is there some standard way? Searching online hasn’t proved to be fruitful.

I’m also happy to hear if there’s a better-suited model for this - I haven’t done much work in the spatial stats realm. The emphasis is definitely prediction, but a full pdf is necessary, not just point predictions.

1 Like

It sounds like you have, for each point observation, some vector of covariates X that goes along with the point coordinates, right? I think that taking the mean or median of within-bin covariate values is probably the most straightforward thing to do but I can’t give you a theoretical justification for why that would be. I think that’s what the authors of this example do. It looks like they made an error in their problem formulation, though since the intensity surface \lambda should read \lambda(s) = \alpha + \beta X(s) + u(s) instead of \beta X(u).

Conceptually, the more elegant thing to do would be to model each covariate as a random surface (though not as a point process itself, just a standard GP), and then model correlation structure between all P covariates and the intensity function as a P+1-dimensional multivariate Gaussian process. I am skeptical over whether or not this would be much better in terms of getting an accurate and reliable predictive PDF, however.


Yeah, I thought about that option. I have concerns about it, in high-occupancy bins you roll up a lot of info into one number, and in low-occupancy bins they’re over-represented by just a few observations. Might be worth just giving it a shot though. Thanks for the reference link.

That’s really interesting, I love the idea but also it sounds like it might melt some processors trying to fit :slight_smile:

Thanks for the thoughts! Very much appreciated.