Hi Alex,
Thank you for the reply! I understand how I would index the RV if I wanted to index it to some specific input in the 2D array, like you describe mu[:, 0]
, but what I want to do is different (or I’m just thick and missing something):
In my case, I want one RV for each gene / dataset pairing:
| | D1 | D2 | D3 | D4 |
|----|-----------|-----------|-----------|-----------|
| G1 | $X_{1,1}$ | $X_{1,2}$ | $X_{1,3}$ | $X_{1,4}$ |
| G2 | $X_{2,1}$ | $X_{2,2}$ | $X_{2,3}$ | $X_{2,4}$ |
| G3 | $X_{3,1}$ | $X_{3,2}$ | $X_{3,3}$ | $X_{3,4}$ |
The number of samples in each dataset is different, so I may have 127 samples in dataset 1 and 75 samples in dataset 2. So for 100 genes and 2 datasets I have 127 * 100 + 75 * 100 = 21,500 values I want to pass in as observations for these (100, 2) RVs.
If the shape of y is scalar, then index
is just vector the same length as obs
that map to the observations. Now that my RVs are in a 2D matrix, I’m trying to figure out how to pass in these 21,500 observations and map them to the the appropriate RV.
For context, here is the complete model with the reshape
approach. I think having two different observed
may indicate I should break it up into separate models?
with pm.Model() as model:
# Constants
training_genes = list(sample.index)
N = bgdf.tissue.nunique()
M = len(training_genes)
MN = M * N
# Gene expression priors
sd = pm.InverseGamma('gm_sd', 1, 1, shape=MN)
mu = pm.Normal('gm_mu', 0, 10, shape=MN)
# Model gene expression
x_hat = pm.Normal('x_hat', mu=mu[ix], sd=sd[ix], shape=MN, observed=obs)
x = pm.Normal('x', mu=mu, sd=sd, shape=MN)
# Likelihood priors
eps = pm.InverseGamma('eps', 1, 1)
beta = pm.Dirichlet("beta", a=np.ones(N))
# Likelihood
y_hat = pm.Deterministic('y_hat', pm.math.dot(x.reshape((M, N)), beta))
y = pm.Laplace('y', mu=y_hat[six], b=eps, observed=sample.values)
(3 datasets, 125 genes)