Product of two probabilities distributions

elelias · September 28, 2018, 2:52pm

Apologies if this is a silly question, but couldn’t find the answer.

I have a transactional dataset that I would like to model so that I can do an A/B test.

The model is:

S(x) = T * V(x)

where T is the probability that a user buys something and V(x) is the probability that a purchase will have log value x

T can be modeled as pm.Bernoulli and V(x) can be modeled as a pm.Normal

Now I would have thought that one could do:

conv  = pm.Bernoulli("conversion",p) #priors are defined somewhere else before the code
value = pm.Normal("value", mu, sd)
val    = pm.Deterministic('name',conv*value, observed=data)

But that doesn’t work, as Deterministic doesn’t take observed.

How do I specify a probability function that is the product of conv and value so that I can get samples of the posterior for p and mu?

Thanks.

simon_o · September 28, 2018, 3:46pm

This is a classical problem that I would guess almost every PyMC3 user faces at some point or another… the simplest and most common workaround is to wrap your val in a pm.Normal with a very small sd parameter value (but not so small that you get the “Bad inital energy: -inf” error message).

If you are more strict about the exactness of the model you infer, I’ve read that it may be possible to solve this problem with a custom likelihood using pm.Potential – but I don’t have any experience with that (yet).

A thread of potential interest (since the participants talk about this problem a bit): Help with Multivariate SDE Timeseries

elelias · September 29, 2018, 2:58pm

I don’t think the pm.Normal makes a ton of sense here though. I don’t want it to force to have a small sd parameter.

I also don’t see how potential can be of use here. I would have imagined there would be a clearer example of how to do this, as you say, almost every PyMC3 user faces this at some point (I guess).

twiecki · September 29, 2018, 6:38pm

@elelias Yes, a Potential would be the way to go. You would want to create two distribution objects, rather than RVs, something along the lines of:

conv = pm.Bernoulli.dist(p=p).logp(data != 0)
value = pm.Normal.dist(mu=mu, sd=sd).logp(data)
pm.Potential('obs', value * conv)

Now I’m actually not quite sure if that results in the right model but it’s worth a try. You can check https://docs.pymc.io/notebooks/GLM-robust-with-outlier-detection.html for another example of using Determinstic to define a likelihood.

elelias · September 30, 2018, 11:52am

Hi @twiecki, thanks for your answer. Why do you think it may not be the right model? In the sense that this is not the right way to model this data or in the sense that it may not work out in pymc3?

twiecki · September 30, 2018, 7:55pm

Thinking about this further, this probably is sufficient:

pm.Bernoulli('o1', p=p, observed=data != 0)
pm.Normal('o2', mu=mu, sd=sd, observed=data[data != 0])

elelias · September 30, 2018, 8:07pm

hmmm, that would be extremely elegant. Thanks for the suggestion, I’ll give it a go.

Marion2025 · October 2, 2018, 3:40am

You might be able to just write a custom distribution. These two posts might be helpful.

There also seems to be a similar post on this where a custom distribution and potential were used

sainathadapa · December 7, 2018, 1:45pm

@elelias Were you able to figure out a solution? I’m trying to solve a similar problem, and got stuck.

whimian · June 8, 2021, 8:46am

There seems to be no result of this discussion.

Topic		Replies	Views
How to model product of multiple random variables Questions	8	4194	June 28, 2019
Likelihood given by product of two distributions? Questions	6	2184	August 4, 2018
Revenue metric for ab testing Questions	12	3293	September 19, 2022
Observed data as a product of distributions version agnostic modeling	4	40	March 13, 2025
Mixture of probabilities Questions	3	470	October 30, 2018

Product of two probabilities distributions

Related topics