Pymc-marketing pm.Potential(f())

Thanks to all for a great probabilistic programming package and to @ricardoV94 who has answered my previous inquiries.
In pymc-marketing, subsequent to defining the appropriate priors and executing:

bgm = clv.BetaGeoModel(
    data = rfm_data,
    model_config = model_config,
)
bgm.build_model()
bgm

We see:

BG/NBD
         a ~ HalfNormal(0, 10)
         b ~ HalfNormal(0, 50)
     alpha ~ HalfNormal(0, 50)
         r ~ HalfNormal(0, 10)
likelihood ~ Potential(f(r, alpha, b, a))

Though i define the priors for a, b, alpha and r, i don’t define a likelihood, as we do when creating a model directly using pymc. My question then is, what do I determine the likelihood to be given this:

likelihood ~ Potential(f(r, alpha, b, a))

Hi,

I’m not sure I understand the question. The library defines the likelihood automatically for you as that potential form.

It might help to understand what a Potential is and how it differs from a usual PyMC model.

You note that:

β€œThough i define the priors for a, b, alpha and r, i don’t define a likelihood, as we do when creating a model directly using pymc”.

I actually dislike the language I bolded in the quote. What you are doing in PyMC is never defining a likelihood; instead you’re describing a data generating process, some elements of which are observed.

In this simple example:

with pm.Model() as m:
    mu = pm.Normal('mu')
    sigma = pm.Exponential('sigma', 1)
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=np.random.normal(loc=0, scale=10, size=(100,))

It is common in examples and older code to give what I called obs the name likelihood. I dislike this because what was defined here in this code is a distribution. If you evaluate obs using e.g. obs.eval(), you will not get back the likelihood of the observed data, you will instead get back a draw from the distribtion N(mu, sigma), conditioned on draws from mu and sigma. This is illustrated by the m.to_graphviz() method, which shows you the graph of the data generating in plate notation:

What PyMC does for you is it takes this generate model and transforms it into a log probability, conditioned on observed data where appropriate. You can see this by looking at the computation graph m.logp().dprint():

Sum{axes=None} [id A] '__logp'
 └─ MakeVector{dtype='float64'} [id B]
    β”œβ”€ Sum{axes=None} [id C]
    β”‚  └─ Check{sigma > 0} [id D] 'mu_logprob'
    β”‚     β”œβ”€ Sub [id E]
    β”‚     β”‚  β”œβ”€ Sub [id F]
    β”‚     β”‚  β”‚  β”œβ”€ Mul [id G]
    β”‚     β”‚  β”‚  β”‚  β”œβ”€ -0.5 [id H]
    β”‚     β”‚  β”‚  β”‚  └─ Pow [id I]
    β”‚     β”‚  β”‚  β”‚     β”œβ”€ True_div [id J]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”œβ”€ Sub [id K]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”‚  β”œβ”€ mu [id L]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”‚  └─ 0 [id M]
    β”‚     β”‚  β”‚  β”‚     β”‚  └─ 1.0 [id N]
    β”‚     β”‚  β”‚  β”‚     └─ 2 [id O]
    β”‚     β”‚  β”‚  └─ Log [id P]
    β”‚     β”‚  β”‚     └─ Sqrt [id Q]
    β”‚     β”‚  β”‚        └─ 6.283185307179586 [id R]
    β”‚     β”‚  └─ Log [id S]
    β”‚     β”‚     └─ 1.0 [id N]
    β”‚     └─ All{axes=None} [id T]
    β”‚        └─ MakeVector{dtype='bool'} [id U]
    β”‚           └─ All{axes=None} [id V]
    β”‚              └─ Gt [id W]
    β”‚                 β”œβ”€ 1.0 [id N]
    β”‚                 └─ 0 [id X]
    β”œβ”€ Sum{axes=None} [id Y]
    β”‚  └─ Add [id Z] 'sigma_log___logprob'
    β”‚     β”œβ”€ Check{mu >= 0} [id BA]
    β”‚     β”‚  β”œβ”€ Switch [id BB]
    β”‚     β”‚  β”‚  β”œβ”€ Ge [id BC]
    β”‚     β”‚  β”‚  β”‚  β”œβ”€ Exp [id BD]
    β”‚     β”‚  β”‚  β”‚  β”‚  └─ sigma_log__ [id BE]
    β”‚     β”‚  β”‚  β”‚  └─ 0.0 [id BF]
    β”‚     β”‚  β”‚  β”œβ”€ Sub [id BG]
    β”‚     β”‚  β”‚  β”‚  β”œβ”€ Neg [id BH]
    β”‚     β”‚  β”‚  β”‚  β”‚  └─ Log [id BI]
    β”‚     β”‚  β”‚  β”‚  β”‚     └─ 1.0 [id N]
    β”‚     β”‚  β”‚  β”‚  └─ True_div [id BJ]
    β”‚     β”‚  β”‚  β”‚     β”œβ”€ Exp [id BD]
    β”‚     β”‚  β”‚  β”‚     β”‚  └─ Β·Β·Β·
    β”‚     β”‚  β”‚  β”‚     └─ 1.0 [id N]
    β”‚     β”‚  β”‚  └─ -inf [id BK]
    β”‚     β”‚  └─ All{axes=None} [id BL]
    β”‚     β”‚     └─ MakeVector{dtype='bool'} [id BM]
    β”‚     β”‚        └─ All{axes=None} [id BN]
    β”‚     β”‚           └─ Ge [id BO]
    β”‚     β”‚              β”œβ”€ 1.0 [id N]
    β”‚     β”‚              └─ 0 [id BP]
    β”‚     └─ Identity [id BQ] 'sigma_log___log_jacobian'
    β”‚        └─ sigma_log__ [id BE]
    └─ Sum{axes=None} [id BR]
       └─ Check{sigma > 0} [id BS] 'obs_logprob'
          β”œβ”€ Sub [id BT]
          β”‚  β”œβ”€ Sub [id BU]
          β”‚  β”‚  β”œβ”€ Mul [id BV]
          β”‚  β”‚  β”‚  β”œβ”€ ExpandDims{axis=0} [id BW]
          β”‚  β”‚  β”‚  β”‚  └─ -0.5 [id BX]
          β”‚  β”‚  β”‚  └─ Pow [id BY]
          β”‚  β”‚  β”‚     β”œβ”€ True_div [id BZ]
          β”‚  β”‚  β”‚     β”‚  β”œβ”€ Sub [id CA]
          β”‚  β”‚  β”‚     β”‚  β”‚  β”œβ”€ obs{[  6.28412 ... .0402883 ]} [id CB]
          β”‚  β”‚  β”‚     β”‚  β”‚  └─ ExpandDims{axis=0} [id CC]
          β”‚  β”‚  β”‚     β”‚  β”‚     └─ mu [id L]
          β”‚  β”‚  β”‚     β”‚  └─ ExpandDims{axis=0} [id CD]
          β”‚  β”‚  β”‚     β”‚     └─ Exp [id CE]
          β”‚  β”‚  β”‚     β”‚        └─ sigma_log__ [id BE]
          β”‚  β”‚  β”‚     └─ ExpandDims{axis=0} [id CF]
          β”‚  β”‚  β”‚        └─ 2 [id CG]
          β”‚  β”‚  └─ ExpandDims{axis=0} [id CH]
          β”‚  β”‚     └─ Log [id CI]
          β”‚  β”‚        └─ Sqrt [id CJ]
          β”‚  β”‚           └─ 6.283185307179586 [id CK]
          β”‚  └─ Log [id CL]
          β”‚     └─ ExpandDims{axis=0} [id CD]
          β”‚        └─ Β·Β·Β·
          └─ All{axes=None} [id CM]
             └─ MakeVector{dtype='bool'} [id CN]
                └─ All{axes=None} [id CO]
                   └─ Gt [id CP]
                      β”œβ”€ ExpandDims{axis=0} [id CD]
                      β”‚  └─ Β·Β·Β·
                      └─ ExpandDims{axis=0} [id CQ]
                         └─ 0 [id CR]

These admittedly take some getting used to reading, but if you look through it, you’ll see some familiar things. For example:

          β”‚  β”‚  β”‚     β”œβ”€ True_div [id BZ]
          β”‚  β”‚  β”‚     β”‚  β”œβ”€ Sub [id CA]
          β”‚  β”‚  β”‚     β”‚  β”‚  β”œβ”€ obs{[  6.28412 ... .0402883 ]} [id CB]
          β”‚  β”‚  β”‚     β”‚  β”‚  └─ ExpandDims{axis=0} [id CC]
          β”‚  β”‚  β”‚     β”‚  β”‚     └─ mu [id L]
          β”‚  β”‚  β”‚     β”‚  └─ ExpandDims{axis=0} [id CD]
          β”‚  β”‚  β”‚     β”‚     └─ Exp [id CE]
          β”‚  β”‚  β”‚     β”‚        └─ sigma_log__ [id BE]

This chunk is computing the kernel of the normal likelihood: \frac{x - \mu}{\sigma}

So this is how a β€œnormal” PyMC model works: define a generative graph that produces data, then show it some data, then it works out what the probability of the whole system is, given data and parameters.

pm.Potential is our way of side-stepping the generative process and working with the logp graph directly. Suppose we add a potential term to this simple model:

with pm.Model() as m2:
    mu = pm.Normal('mu')
    sigma = pm.Exponential('sigma', 1)
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    pm.Potential('positivity_constraint', pm.math.switch(pm.math.ge(obs, 0), 0, -np.inf))

If you go to do prior sampling on this model with e.g. pm.sample_prior_predictive, you’ll get this warning:

UserWarning: The effect of Potentials on other parameters is ignored during prior predictive sampling. This is likely to lead to invalid or biased predictive samples.

This is because the Potential doesn’t live on the generative graph. Our generative model will still produce negative draws of obs, because the potential only lives on the logp graph. If we check m2.logp().dprint() we can see it:

Sum{axes=None} [id A] '__logp'
 └─ MakeVector{dtype='float64'} [id B]
    β”œβ”€ Sum{axes=None} [id C]
    β”‚  └─ Check{sigma > 0} [id D] 'mu_logprob'
    β”‚     β”œβ”€ Sub [id E]
    β”‚     β”‚  β”œβ”€ Sub [id F]
    β”‚     β”‚  β”‚  β”œβ”€ Mul [id G]
    β”‚     β”‚  β”‚  β”‚  β”œβ”€ -0.5 [id H]
    β”‚     β”‚  β”‚  β”‚  └─ Pow [id I]
    β”‚     β”‚  β”‚  β”‚     β”œβ”€ True_div [id J]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”œβ”€ Sub [id K]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”‚  β”œβ”€ mu [id L]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”‚  └─ 0 [id M]
    β”‚     β”‚  β”‚  β”‚     β”‚  └─ 1.0 [id N]
    β”‚     β”‚  β”‚  β”‚     └─ 2 [id O]
    β”‚     β”‚  β”‚  └─ Log [id P]
    β”‚     β”‚  β”‚     └─ Sqrt [id Q]
    β”‚     β”‚  β”‚        └─ 6.283185307179586 [id R]
    β”‚     β”‚  └─ Log [id S]
    β”‚     β”‚     └─ 1.0 [id N]
    β”‚     └─ All{axes=None} [id T]
    β”‚        └─ MakeVector{dtype='bool'} [id U]
    β”‚           └─ All{axes=None} [id V]
    β”‚              └─ Gt [id W]
    β”‚                 β”œβ”€ 1.0 [id N]
    β”‚                 └─ 0 [id X]
    β”œβ”€ Sum{axes=None} [id Y]
    β”‚  └─ Add [id Z] 'sigma_log___logprob'
    β”‚     β”œβ”€ Check{mu >= 0} [id BA]
    β”‚     β”‚  β”œβ”€ Switch [id BB]
    β”‚     β”‚  β”‚  β”œβ”€ Ge [id BC]
    β”‚     β”‚  β”‚  β”‚  β”œβ”€ Exp [id BD]
    β”‚     β”‚  β”‚  β”‚  β”‚  └─ sigma_log__ [id BE]
    β”‚     β”‚  β”‚  β”‚  └─ 0.0 [id BF]
    β”‚     β”‚  β”‚  β”œβ”€ Sub [id BG]
    β”‚     β”‚  β”‚  β”‚  β”œβ”€ Neg [id BH]
    β”‚     β”‚  β”‚  β”‚  β”‚  └─ Log [id BI]
    β”‚     β”‚  β”‚  β”‚  β”‚     └─ 1.0 [id N]
    β”‚     β”‚  β”‚  β”‚  └─ True_div [id BJ]
    β”‚     β”‚  β”‚  β”‚     β”œβ”€ Exp [id BD]
    β”‚     β”‚  β”‚  β”‚     β”‚  └─ Β·Β·Β·
    β”‚     β”‚  β”‚  β”‚     └─ 1.0 [id N]
    β”‚     β”‚  β”‚  └─ -inf [id BK]
    β”‚     β”‚  └─ All{axes=None} [id BL]
    β”‚     β”‚     └─ MakeVector{dtype='bool'} [id BM]
    β”‚     β”‚        └─ All{axes=None} [id BN]
    β”‚     β”‚           └─ Ge [id BO]
    β”‚     β”‚              β”œβ”€ 1.0 [id N]
    β”‚     β”‚              └─ 0 [id BP]
    β”‚     └─ Identity [id BQ] 'sigma_log___log_jacobian'
    β”‚        └─ sigma_log__ [id BE]
    β”œβ”€ Sum{axes=None} [id BR]
    β”‚  └─ Check{sigma > 0} [id BS] 'obs_logprob'
    β”‚     β”œβ”€ Sub [id BT]
    β”‚     β”‚  β”œβ”€ Sub [id BU]
    β”‚     β”‚  β”‚  β”œβ”€ Mul [id BV]
    β”‚     β”‚  β”‚  β”‚  β”œβ”€ ExpandDims{axis=0} [id BW]
    β”‚     β”‚  β”‚  β”‚  β”‚  └─ -0.5 [id BX]
    β”‚     β”‚  β”‚  β”‚  └─ Pow [id BY]
    β”‚     β”‚  β”‚  β”‚     β”œβ”€ True_div [id BZ]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”œβ”€ Sub [id CA]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”‚  β”œβ”€ obs{[  7.73650 ... .62535995]} [id CB]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”‚  └─ ExpandDims{axis=0} [id CC]
    β”‚     β”‚  β”‚  β”‚     β”‚  β”‚     └─ mu [id L]
    β”‚     β”‚  β”‚  β”‚     β”‚  └─ ExpandDims{axis=0} [id CD]
    β”‚     β”‚  β”‚  β”‚     β”‚     └─ Exp [id CE]
    β”‚     β”‚  β”‚  β”‚     β”‚        └─ sigma_log__ [id BE]
    β”‚     β”‚  β”‚  β”‚     └─ ExpandDims{axis=0} [id CF]
    β”‚     β”‚  β”‚  β”‚        └─ 2 [id CG]
    β”‚     β”‚  β”‚  └─ ExpandDims{axis=0} [id CH]
    β”‚     β”‚  β”‚     └─ Log [id CI]
    β”‚     β”‚  β”‚        └─ Sqrt [id CJ]
    β”‚     β”‚  β”‚           └─ 6.283185307179586 [id CK]
    β”‚     β”‚  └─ Log [id CL]
    β”‚     β”‚     └─ ExpandDims{axis=0} [id CD]
    β”‚     β”‚        └─ Β·Β·Β·
    β”‚     └─ All{axes=None} [id CM]
    β”‚        └─ MakeVector{dtype='bool'} [id CN]
    β”‚           └─ All{axes=None} [id CO]
    β”‚              └─ Gt [id CP]
    β”‚                 β”œβ”€ ExpandDims{axis=0} [id CD]
    β”‚                 β”‚  └─ Β·Β·Β·
    β”‚                 └─ ExpandDims{axis=0} [id CQ]
    β”‚                    └─ 0 [id CR]
    └─ Cast{float64} [id CS]
       └─ Sum{axes=None} [id CT]
          └─ Switch [id CU] 'positivity_constraint'
             β”œβ”€ Ge [id CV]
             β”‚  β”œβ”€ obs{[  7.73650 ... .62535995]} [id CB]
             β”‚  └─ ExpandDims{axis=0} [id CW]
             β”‚     └─ 0 [id CX]
             β”œβ”€ ExpandDims{axis=0} [id CY]
             β”‚  └─ 0 [id CZ]
             └─ ExpandDims{axis=0} [id DA]
                └─ -inf [id DB]

It’s all the way at the bottom. If you check at the very top, all the terms on this graph are being fed into a Sum{axes=None} [id A] '__logp' operation. What this means is that when you write a pm.Potential, whatever you pass into it will be automatically added to the logp of the model. I did not assign the potential in the 2nd model example to a variable to emphasize this point.

So to make a long story short, pm.Potential defines a logp term that doesn’t live in the generative model you declare in a β€œnormal” PyMC workflow. Evidently, this is used in the pymc-marketing model you are studying. It doesn’t mean anything is wrong, it just means the devs decided it was more convenient to work directly with the logp terms than with the generative process.

5 Likes

Thank you @jessegrabowski for a thoughtful and copious response. I sincerely appreciate it. I just wanted your interview in Italy recorded during the PyMC retreat. Hope to learn more from you.

2 Likes

Thank you @RicardoV94 via PyMC Discourse I was unaware that the pymc-marketing library defines the likelihood automatically.