Product of two probabilities distributions

Apologies if this is a silly question, but couldn’t find the answer.

I have a transactional dataset that I would like to model so that I can do an A/B test.

The model is:

S(x) = T * V(x)

where T is the probability that a user buys something and V(x) is the probability that a purchase will have log value x

T can be modeled as pm.Bernoulli and V(x) can be modeled as a pm.Normal

Now I would have thought that one could do:

conv  = pm.Bernoulli("conversion",p) #priors are defined somewhere else before the code
value = pm.Normal("value", mu, sd)
val    = pm.Deterministic('name',conv*value, observed=data)

But that doesn’t work, as Deterministic doesn’t take observed.

How do I specify a probability function that is the product of conv and value so that I can get samples of the posterior for p and mu?

Thanks.

This is a classical problem that I would guess almost every PyMC3 user faces at some point or another… the simplest and most common workaround is to wrap your val in a pm.Normal with a very small sd parameter value (but not so small that you get the “Bad inital energy: -inf” error message).

If you are more strict about the exactness of the model you infer, I’ve read that it may be possible to solve this problem with a custom likelihood using pm.Potential – but I don’t have any experience with that (yet).

A thread of potential interest (since the participants talk about this problem a bit): Help with Multivariate SDE Timeseries

1 Like

I don’t think the pm.Normal makes a ton of sense here though. I don’t want it to force to have a small sd parameter.

I also don’t see how potential can be of use here. I would have imagined there would be a clearer example of how to do this, as you say, almost every PyMC3 user faces this at some point (I guess).

@elelias Yes, a Potential would be the way to go. You would want to create two distribution objects, rather than RVs, something along the lines of:

conv = pm.Bernoulli.dist(p=p).logp(data != 0)
value = pm.Normal.dist(mu=mu, sd=sd).logp(data)
pm.Potential('obs', value * conv)

Now I’m actually not quite sure if that results in the right model but it’s worth a try. You can check https://docs.pymc.io/notebooks/GLM-robust-with-outlier-detection.html for another example of using Determinstic to define a likelihood.

3 Likes

Hi @twiecki, thanks for your answer. Why do you think it may not be the right model? In the sense that this is not the right way to model this data or in the sense that it may not work out in pymc3?

Thinking about this further, this probably is sufficient:

pm.Bernoulli('o1', p=p, observed=data != 0)
pm.Normal('o2', mu=mu, sd=sd, observed=data[data != 0])
2 Likes

hmmm, that would be extremely elegant. Thanks for the suggestion, I’ll give it a go.

You might be able to just write a custom distribution. These two posts might be helpful.

There also seems to be a similar post on this where a custom distribution and potential were used

2 Likes

@elelias Were you able to figure out a solution? I’m trying to solve a similar problem, and got stuck.

1 Like

There seems to be no result of this discussion. :confused: