Analyzing EEG / MEG data with PyMC3

Hi,

just to provide some background: I am a scientist working with EEG / MEG data. I recently came across PyMC3 and was wondering whether we can apply this to our data.

The probably most common use case is this: I collected data from N participants in a paradigm that has 2 conditions. The data is sampled from a number of sensors (102 magnetometers in my case). Because the data is very noisy, I average over something like 100 trials within each of the conditions for each participant separately.

So, what I end up with is a matrix with the shape of (n_participants, n_channels, n_samples) for each condition. Because I am interested in the difference between the conditions, I use the difference of the two matrices for further analysis.

The hypothesis I want to test is that at some channels and some samples the mean of the differences over all participants is not 0.

I can quite safely assume that my the differences follow a student-t distribution. So, I tried to come up with a model and used the models defined in the BEST toolbox as a template.

This is what I cam up with:

# normalisation seems to be needed because the values are in the range of 1e-15 and that seems to be a problem for the sampling algorithm
norm_factor = np.std(data_for_model)
data_for_model = data_for_model / norm_factor

nu_min = 2.5
nu_mean = 30
nu_mean - nu_min

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1, shape=(n_channels, n_samples)) # we want to estimate the mean for each channel and sample
    std = pm.Uniform('std', lower=1e-3, upper=1e3, shape=(n_channels, 1)) # but we can assume that the std deviation is the same within each channel
    nu = pm.Exponential('nu', 1 / (nu_mean - nu_min)) + nu_min
    modeled_data = pm.StudentT('modeled', mu=mu, nu=nu, sigma=std, observed=data_for_model, total_size=data_for_model.shape)

    trace = pm.sample(1000)

Here are my questions:

  1. Does this make sense to you? Or did I miss something important? This is my first attempt…
  2. It seems that it is important to scale the really tiny values because otherwise, I get lots of errors from the sampler. Does that make sense?
  3. Any way to speed this up? It takes like 15 hours on my laptop. I could send it to a quite powerful cluster that has ~200 cores. But it seems that the cores argument to pm.sample then only increases the number of chains, so it would not be done quicker. I already looked into providing a multiprocessing context but I need to limit the amount of cores available to it and I could not find out, how to do this.

Any help is highly appreciated! Thanks in advance!

Thomas

1 Like

Hi @thht

if you don’t mind the opinion from a PyMC3 novice here are some ideas

  1. You might want to consider a more informative prior for the scale parameter, at the moment is extremely wide. I usually go for an HalfCauchy distribution. You can center it on a very small value if you are afraid that large values can cause problems.
  2. Given the nature of EEG signal (if I recall correctly from my master) you might want to model the correlation between electrodes.
  3. It might be an overkill here, but PyMC3 offers approximate variational inference which can provide consistent speed-up.

Finally,
can you expand a bit on the nature of n_samples? I struggle to depict the experimental design (I was expecting a (n_participants, n_channels) matrix).

1 Like

hi,
thanks for the input.

  1. What do you mean by the “scale” parameter? The std?
  2. Yes, you are right. and for the MEG, I actually have a noise covariance matrix that, so i know how they covary/correlate. But i have no idea, how to model that and/or use that strong prior information. do you have any idea?
  3. Actually, using variational inference with minibatches is the way I do it now. It decreases the computation time from ~20 hours to 20 minutes.

Concerning your question regarding the shape of the matrix: EEG or MEG data is sampled at a regular interval like 1000Hz. It is the strength of these methods that they allow you to infer when an effect happens. So, let’s say, I want to analyze the cortical response to an auditory (e.g., a sound) compared to a visual (an image, for instance) stimulus. I would present the audio ~100 times and the visual ~100 times as well. I have triggers in my data, so I know exactly, when this stimulation happened. I would then cut out the data around these triggers (like 300ms before the stimulation to 700ms after). Because of the very low signal to noise ratio, I would then average the 100 visual and 100 auditory trials within each participant. Assuming a sampling rate of 1000Hz, I would have 1000 samples per channel for each condition and participant.

This leaves me with a (n_channels, n_samples) matrix per participant and condition, which reduces to just one of these matrices per participant before I apply the statistics, because I am interested at what channels (where?) and what samples (when?) there is a difference between the two conditions.

So, in the end, I concatenate all these matrices for all participants, which leaves me with said (n_participants, n_channels, n_samples) matrix.

Does this make it clearer?

Hi

yes that is very clear now.

  1. Yes, by scale parameter I meant the std, might be quite challenging to sample from an uniform prior in the range 0.001 - 1000.

  2. I only played around with it a couple of times, but I believe you could use a Multivariate Normal with your noise matrix as covariance matrix. The lecture by Richard McElreath on Gaussian Processes is the most accessible resource I found so far on the subject.

  3. The same logic I suspect it also applies to samples, since they represent different points in time you might expect some form of temporal structure to be present. This would probably require a spatio-temporal model but this is something way out of my depth.

1 Like

Hi, I’m more than a year late, but just in case you’re still interested in this topic I’ll leave a comment here. If you’re interested in the difference between conditions, the best way to conduct inference in that respect, to avoid a big loss of information (acknowledged uncertainty), would be to include both conditions in your model and then take the difference between the posteriors of each condition (ex-post, rather than taking the difference between groups from the observed signal). In that case you could do something like:

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0.0, sigma=1.0, shape=(n_channels, n_samples, n_conditions)) 
    std = pm.HalfNormal('std', 1.0)
    nu = pm.Exponential('nu', 1 / (nu_mean - nu_min)) + nu_m
    y = pm.StudentT('y', mu=mu, nu=nu, sigma=std, observed=data)

    trace = pm.sample(1000)

Ideally, however, you would like to have the mu parameter as a linear regression formula (i.e. with an intercept and a slope), so it can tell the change induced as a function of condition, that could allow you to add varying intercepts and/or slopes by stimuli and participant as well. I have taken an approach like that here: https://onlinelibrary.wiley.com/doi/full/10.1111/ejn.15470 (that still needs much improvement though). If you’re interested in accounting for the time domain across the epoch, then you could give a look to this: Bayesian models for event-related potentials time-series and electrode correlations estimation | bioRxiv . That implements a method for getting covariance/correlation matrices via LKJ priors. These could also be implemented in more conventional linear regression models, I would also recommend a lecture by McElreath on this topic, though a different one: Statistical Rethinking 2022 Lecture 14 - Correlated Varying Effects - YouTube.

Sorry if I’m too late and this is irrelevant now. But maybe someone else may find it useful as well.

1 Like