A variable simultaneously follow two distributions

frankligy · January 13, 2023, 3:07pm

Hello, I was trying to reproduce a model from a research paper, long story short, in their model, they have a 2D matrix denoted as H, where they said each entry in H has a prior distribution beta ~ (u,sigma), and the u and sigma can be determined by the domain knowledge. The important point is each entry follows its own beta distribution, so H_{ij} ~ Beta(u_{ij},sigma_{ij}), I believe this can be easily implemented like below:

# assuming H is 100*100
mu = np.empty((100,100))
sigma = np.empty((100,100))
H = pm.Beta(mu=mu,sigma=sigma,shapes=(100,100))

But following that, they said each column of H must sum up to one, because of that, each column of matrix H has a prior distribution of Dirichlet(alpha), where alpha is a vector of length 100. Now it seems that I have to write something like below:

H = pm.Dirichlet(alpha=np.empty((100,100)),shape=(100,100))

My question is, is it valid in Pymc to define a stochastic variable that is generated from two different distributions at the same time as I haven’t seen such things in the tutorial? I apologize if my question is too naive as I am just learning the bayesian modeling and am happy to further clarify my question.

Many thanks in advance,
Frank

ricardoV94 · January 13, 2023, 8:43pm

You can’t do that. If H is not observed you can just subtract the column mean and add 1?

I don’t know how you put alpha in the picture though. Can you link to the paper?

frankligy · January 13, 2023, 9:37pm

Sure and sorry for the confusion, here’s the link to the paper’s method section (Cell type identification in spatial transcriptomics data can be improved by leveraging cell-type-informative paired tissue images using a Bayesian probabilistic model | Nucleic Acids Research | Oxford Academic). And the stan code I believe that relevant to H is here (https://github.com/asifzubair/GIST/blob/main/inst/stan/GIST_enhanced_model.stan#L55-L57). Let me know if I can clarify anything.

ricardoV94 · January 14, 2023, 7:16am

I see… I think you can do that with a Potential or CustomDist. Something like this (untested):


def logp(H, mu, sigma, alpha):
  logp_entries = pm.logp(pm.Beta.dist(mu=mu, sigma=sigma), H)
  logp_rows = pm.logp(pm.Dirichlet.dist(alpha), H)
  return logp_entries.sum(axis=-1) + logp_rows

with pm.Model() as m:
  mu = ...
  sigma = ...
  alpha = ...
  H = pm.CustomDist("H", mu, sigma, alpha, logp=logp, ndim_supp=1, ndims_params=[0, 0, 1])

That model adds the Dirichlet prior over rows (alpha.shape == H.shape=[-1]) if you had a single alpha just like a vanilla Dirichlet distribution with batched dimensions. I think the Stan example is doing the same, but please double check.

If you want the constraint over columns you can do that as well, just need to shape things around. By default the support dimensions (that is dimensions that are not independent) are to the rightmost position so this is more natural. Distribution Dimensionality — PyMC 5.0.1 documentation

frankligy · January 14, 2023, 3:17pm

Thanks for much for the examples and explanations!

Topic		Replies	Views
Posterior distributions in a confusion matrix Questions	0	695	October 24, 2019
Multivariate linear regression with different prior distributions for coefficients v3 modeling	0	533	May 7, 2022
How to set some parts of an array of distributions to a deterministic value? Questions	6	1339	March 3, 2021
Modeling multivariate distributions for bayesian optimal inference v3	4	517	June 19, 2023
Modeling bivariate beta distributions Questions	7	2084	January 7, 2022

A variable simultaneously follow two distributions

Related topics