A variable simultaneously follow two distributions

Hello, I was trying to reproduce a model from a research paper, long story short, in their model, they have a 2D matrix denoted as H, where they said each entry in H has a prior distribution beta ~ (u,sigma), and the u and sigma can be determined by the domain knowledge. The important point is each entry follows its own beta distribution, so H_{ij} ~ Beta(u_{ij},sigma_{ij}), I believe this can be easily implemented like below:

# assuming H is 100*100
mu = np.empty((100,100))
sigma = np.empty((100,100))
H = pm.Beta(mu=mu,sigma=sigma,shapes=(100,100))

But following that, they said each column of H must sum up to one, because of that, each column of matrix H has a prior distribution of Dirichlet(alpha), where alpha is a vector of length 100. Now it seems that I have to write something like below:

H = pm.Dirichlet(alpha=np.empty((100,100)),shape=(100,100))

My question is, is it valid in Pymc to define a stochastic variable that is generated from two different distributions at the same time as I haven’t seen such things in the tutorial? I apologize if my question is too naive as I am just learning the bayesian modeling and am happy to further clarify my question.

Many thanks in advance,
Frank

You can’t do that. If H is not observed you can just subtract the column mean and add 1?

I don’t know how you put alpha in the picture though. Can you link to the paper?

Sure and sorry for the confusion, here’s the link to the paper’s method section (Cell type identification in spatial transcriptomics data can be improved by leveraging cell-type-informative paired tissue images using a Bayesian probabilistic model | Nucleic Acids Research | Oxford Academic). And the stan code I believe that relevant to H is here (https://github.com/asifzubair/GIST/blob/main/inst/stan/GIST_enhanced_model.stan#L55-L57). Let me know if I can clarify anything.

I see… I think you can do that with a Potential or CustomDist. Something like this (untested):


def logp(H, mu, sigma, alpha):
  logp_entries = pm.logp(pm.Beta.dist(mu=mu, sigma=sigma), H)
  logp_rows = pm.logp(pm.Dirichlet.dist(alpha), H)
  return logp_entries.sum(axis=-1) + logp_rows

with pm.Model() as m:
  mu = ...
  sigma = ...
  alpha = ...
  H = pm.CustomDist("H", mu, sigma, alpha, logp=logp, ndim_supp=1, ndims_params=[0, 0, 1]) 

That model adds the Dirichlet prior over rows (alpha.shape == H.shape=[-1]) if you had a single alpha just like a vanilla Dirichlet distribution with batched dimensions. I think the Stan example is doing the same, but please double check.

If you want the constraint over columns you can do that as well, just need to shape things around. By default the support dimensions (that is dimensions that are not independent) are to the rightmost position so this is more natural. Distribution Dimensionality — PyMC 5.0.1 documentation

Thanks for much for the examples and explanations!