i acquired data using an MEG / EEG from a number of participants. for each participant, i have a matrix of data with the shape
(n_channels, n_samples), where
n_channels is the number of electrodes or magnetometers i acquired the data from and
n_samples are voltages or magnetic field strength that i measured every
x ms (typically 1ms or 10ms)…
the data in the matrix are averaged data from a number of events that all happened at the same time. for instance, i know that at sample 20, an image was shown to the participants.
i thus concatenate the data over all participants, so i have a 3d matrix of the shape
(n_participants, n_channels, n_samples). What I want is an estimate of the mean at each of the samples and channels.
with the help of @jlindbloom, i came up with this model, that works quite nicely (the scale of the data is in the range of 1e-12, so i need the normalizing):
nu_min=2.5, nu_mean=30 norm_factor = np.std(data) with pm.Model() as model: n_channels = data.shape n_samples = data.shape mu_normalized = pm.Normal('mu normalized', mu=0, sigma=1, shape=(n_channels, n_samples)) mu = pm.Deterministic('mu', mu_normalized * norm_factor) std_normalized = pm.Uniform('std normalized', lower=1e-1, upper=1e1, shape=(n_channels, 1)) std = pm.Deterministic('std', std_normalized * norm_factor) nu = pm.Exponential('nu', 1 / (nu_mean - nu_min), shape=(n_channels, 1)) + nu_min modeled_data = pm.StudentT('modeled', mu=mu, nu=nu, sigma=std, observed=data)
as i said, this works really well.
but the thing is: the model assumes that the covariance of the channels is diagonal. and this is not the case. the correlation is actually quite high and i would like to factor this into my model.
following a suggestion by @jlindbloom , i came up with this:
with pm.Model() as model: n_channels = data.shape n_samples = data.shape mu_normalized = pm.Normal('mu normalized', mu=0, sigma=1, shape=(n_channels, n_samples)) mu = pm.Deterministic('mu', mu_normalized * norm_factor) chol_normalized, corr, stds_normalized = pm.LKJCholeskyCov( 'chol_normalized', n=2, eta=2.0, sd_dist=pm.Exponential.dist(1.0), compute_corr=True ) nu = pm.Exponential('nu', 1 / (nu_mean - nu_min), shape=(n_channels, 1)) + nu_min chol = pm.Deterministic('chol', chol_normalized * norm_factor) stds = pm.Deterministic('stds', stds_normalized * norm_factor) cov = pm.Deterministic('cov', chol.dot(chol.T)) modeled_data = pm.MvNormal('modeled', mu=mu, chol=chol, observed=data, shape=(n_channels, n_samples))
however, when i try to run this, i get the following error:
File "home/bla/.venv/lib/python3.9/site-packages/pymc3/distributions/multivariate.py", line 116, in _quaddist raise ValueError("Invalid dimension for value: %s" % value.ndim) ValueError: Invalid dimension for value: 3
i tried to pin down the error and it seems that, opposed to the
pm.StudenT distribution, the multivariate ones do not allow a third dimension in the input.
how can i solve this? one possibility would be to fit the model separately for each channel. but i have the strong assumption that the covariance matrix is constant over time, so i would prefer to estimate for all my data.
thanks a lot in advance!