Can I split multidimensional data to parallelize fitting?

Let’s say I have two dim data with the shape: (n_observations, n_channels). And I have a model like this:

with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1, shape=(n_channels,))
    std = pm.Uniform('std', lower=1e-1, upper=1e1, shape=(n_channels,))
    nu = pm.Exponential('nu', 1 / (nu_mean - nu_min)) + nu_min

    modeled_data = pm.StudentT('modeled', mu=mu, nu=nu, sigma=std, observed=data)

What happens if I use this model twice: once with the first half of n_channels and in a second step with the second half? Will I get more-or-less the same results?

Or in other words: Are the channels treated independently or is there some kind of relationship between them used?

Thanks in advance!
Thomas

What happens if I use this model twice: once with the first half of n_channels and in a second step with the second half? Will I get more-or-less the same results?

If you were to sample this model twice like you described, you’d get back two different posteriors \pi_1 and \pi_2 distributions over parameters, one conditioned on the first subset and a second one conditioned on the second subset. Unfortunately, there isn’t a straightforward to combine these the samples from these two posteriors into samples from a master prior “\pi_3 = \pi_1 + \pi_2”, see here for a discussion.

So no, you can’t parallelize like this. The closest thing I can think of (which doesn’t quite help with parallelization) is that you can theoretically use your posterior \pi_1 over the first subset as your prior over the second subset. In practice you have to do something like a KDE in-between these steps since you don’t actually have \pi_1 but rather samples, which can lead to approximation error. See this notebook for an example. This might help you if you have a massive amount of data (too much to run all at once) and don’t care how long it takes to get the job done, but you have to do this sequentially so this isn’t really a method to get a speedup.

ok, thanks for the quick reply. i would like to ask for some more clarification.

as you can see from the code, what i want to estimate is actually one posterior per channel. at least for mu and std. and i would also be ok with adding the same shape parameter to the nu variable.

so, the question is, whether these posteriors somehow depend on each other. i.e. if i would estimate it for channel 1 and channel 2 separately, would the resulting two posteriors be the same as if i would estimate them together (but still would get two posteriors because of the shape parameter).

i have actually tried this out on real data. and the difference between calculating all at once and calculating them in separate calls is roughly the same as when i just do the “all at once” calculation twice but with different random seeds.

but please bear in mind that i am totally new to this field and i might miss something here. this is why i am asking.

Oh my bad, I see what you’re asking now. Yes, these should be entirely separate as long as nu_mean and nu_min are constants independent of the data. Your model as written is treating the dimensions as independent (i.e. diagonal covariance). The posterior factorizes

\pi(\theta_{1 : n} | c_1, \ldots, c_m, \ldots, c_n) = \pi(\theta_{1:m} | c_1, \ldots, c_m) \cdot \pi(\theta_{m+1:n} | c_{m+1}, \ldots, c_n)

where the c_i are your vectors of channel data and \theta_j = [\mu_{j}, \sigma_{j}]. So sampling \pi(\theta_{1:m} | c_1, \ldots, c_m) and \pi(\theta_{m+1:n} | c_{m+1}, \ldots, c_n) independently and concatenating these arrays along the channel dimension should give you array samples equivalent to those you’d get from sampling \pi(\theta_{1 : n} | c_1, \ldots, c_m, \ldots, c_n) directly (if you used all channels at once).

1 Like

thanks! as i have written in the other thread, i am evaluating the application of PyMC3 for EEG and MEG analysis. Doing it like this would already be a huge leap forward for the community.

However, as you mentioned the covariance matrix being diagonal: This is clearly not the case for our sensors. In fact they are highly correlated. Currently this is, if at all, taken into account indirectly.

So: If I know the noise-covariance matrix of my channels, how would I need to change the above model in order to reflect this?

I am aware that in this case, I would not be able to split the computation anymore…

Assuming you know a priori the exact covariance matrix \Sigma, you would then just need to plug this into a model using a multivariate Student-T distribution. In PyMC3 this is the pm.MvStudentT distribution. I may not have the shapes quite right but your model would look something like:

covariance_matrix = # your fixed covariance matrix
with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1, shape=(n_channels,))
    nu = pm.Exponential('nu', 1 / (nu_mean - nu_min)) + nu_min
    likelihood = pm.MvStudentT("modeled", mu=mu, nu=nu, cov=covariance_matrix)

Notice that std has disappeared from the model. This is because this information is embedded in the (fixed) covariance matrix.

If you are looking to infer the variances, you could relax this model a bit by instead of fixing a covariance matrix, fixing a prescribed correlation matrix \Pi between the channels, defining std as before, and then weighting the correlation matrix by std to recover the covariance matrix. If v is you vector of standard deviations v = [v_1, \ldots, v_n] then this looks like

\Sigma = \text{diagonal}(v) \, \Pi \, \text{diagonal}(v)

where by \text{diagonal}(\cdot) I mean a zero matrix whose diagonal is filled with the input vector.
In code this is something like

import theano.tensor as tt

correlation = # your fixed correlation matrix
with pm.Model() as model:
    mu = pm.Normal('mu', mu=0, sigma=1, shape=(n_channels,))
    nu = pm.Exponential('nu', 1 / (nu_mean - nu_min)) + nu_min
    std = pm.Uniform('std', lower=1e-1, upper=1e1, shape=(n_channels,))
    std_mtx = tt.nlinalg.diag(std)
    covariance_matrix = tt.nlinalg.matrix_dot(std_mtx, correlation, std_mtx)
    likelihood = pm.MvStudentT("modeled", mu=mu, nu=nu, cov=covariance_matrix)

If you really wanted to, you could even take things one step forward and infer the entire covariance matrix. See this example on how to do this. But your problem may become too massive since as you mentioned, as soon as you add in the covariance you can no longer parallelize the problem.

Hope this helps!

2 Likes

wow. this is just great and you have just saved me a couple of days it would have taken me to figure it out by myself and you have also advanced my understanding of PyMC3 by a lot.

Thank you so much!!

2 Likes