It might help to understand what a Potential is and how it differs from a usual PyMC model.
You note that:
βThough i define the priors for a, b, alpha and r, i donβt define a likelihood, as we do when creating a model directly using pymcβ.
I actually dislike the language I bolded in the quote. What you are doing in PyMC is never defining a likelihood; instead youβre describing a data generating process, some elements of which are observed.
In this simple example:
with pm.Model() as m:
mu = pm.Normal('mu')
sigma = pm.Exponential('sigma', 1)
obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=np.random.normal(loc=0, scale=10, size=(100,))
It is common in examples and older code to give what I called obs
the name likelihood
. I dislike this because what was defined here in this code is a distribution. If you evaluate obs
using e.g. obs.eval()
, you will not get back the likelihood of the observed data, you will instead get back a draw from the distribtion N(mu, sigma)
, conditioned on draws from mu
and sigma
. This is illustrated by the m.to_graphviz()
method, which shows you the graph of the data generating in plate notation:
What PyMC does for you is it takes this generate model and transforms it into a log probability, conditioned on observed data where appropriate. You can see this by looking at the computation graph m.logp().dprint()
:
Sum{axes=None} [id A] '__logp'
ββ MakeVector{dtype='float64'} [id B]
ββ Sum{axes=None} [id C]
β ββ Check{sigma > 0} [id D] 'mu_logprob'
β ββ Sub [id E]
β β ββ Sub [id F]
β β β ββ Mul [id G]
β β β β ββ -0.5 [id H]
β β β β ββ Pow [id I]
β β β β ββ True_div [id J]
β β β β β ββ Sub [id K]
β β β β β β ββ mu [id L]
β β β β β β ββ 0 [id M]
β β β β β ββ 1.0 [id N]
β β β β ββ 2 [id O]
β β β ββ Log [id P]
β β β ββ Sqrt [id Q]
β β β ββ 6.283185307179586 [id R]
β β ββ Log [id S]
β β ββ 1.0 [id N]
β ββ All{axes=None} [id T]
β ββ MakeVector{dtype='bool'} [id U]
β ββ All{axes=None} [id V]
β ββ Gt [id W]
β ββ 1.0 [id N]
β ββ 0 [id X]
ββ Sum{axes=None} [id Y]
β ββ Add [id Z] 'sigma_log___logprob'
β ββ Check{mu >= 0} [id BA]
β β ββ Switch [id BB]
β β β ββ Ge [id BC]
β β β β ββ Exp [id BD]
β β β β β ββ sigma_log__ [id BE]
β β β β ββ 0.0 [id BF]
β β β ββ Sub [id BG]
β β β β ββ Neg [id BH]
β β β β β ββ Log [id BI]
β β β β β ββ 1.0 [id N]
β β β β ββ True_div [id BJ]
β β β β ββ Exp [id BD]
β β β β β ββ Β·Β·Β·
β β β β ββ 1.0 [id N]
β β β ββ -inf [id BK]
β β ββ All{axes=None} [id BL]
β β ββ MakeVector{dtype='bool'} [id BM]
β β ββ All{axes=None} [id BN]
β β ββ Ge [id BO]
β β ββ 1.0 [id N]
β β ββ 0 [id BP]
β ββ Identity [id BQ] 'sigma_log___log_jacobian'
β ββ sigma_log__ [id BE]
ββ Sum{axes=None} [id BR]
ββ Check{sigma > 0} [id BS] 'obs_logprob'
ββ Sub [id BT]
β ββ Sub [id BU]
β β ββ Mul [id BV]
β β β ββ ExpandDims{axis=0} [id BW]
β β β β ββ -0.5 [id BX]
β β β ββ Pow [id BY]
β β β ββ True_div [id BZ]
β β β β ββ Sub [id CA]
β β β β β ββ obs{[ 6.28412 ... .0402883 ]} [id CB]
β β β β β ββ ExpandDims{axis=0} [id CC]
β β β β β ββ mu [id L]
β β β β ββ ExpandDims{axis=0} [id CD]
β β β β ββ Exp [id CE]
β β β β ββ sigma_log__ [id BE]
β β β ββ ExpandDims{axis=0} [id CF]
β β β ββ 2 [id CG]
β β ββ ExpandDims{axis=0} [id CH]
β β ββ Log [id CI]
β β ββ Sqrt [id CJ]
β β ββ 6.283185307179586 [id CK]
β ββ Log [id CL]
β ββ ExpandDims{axis=0} [id CD]
β ββ Β·Β·Β·
ββ All{axes=None} [id CM]
ββ MakeVector{dtype='bool'} [id CN]
ββ All{axes=None} [id CO]
ββ Gt [id CP]
ββ ExpandDims{axis=0} [id CD]
β ββ Β·Β·Β·
ββ ExpandDims{axis=0} [id CQ]
ββ 0 [id CR]
These admittedly take some getting used to reading, but if you look through it, youβll see some familiar things. For example:
β β β ββ True_div [id BZ]
β β β β ββ Sub [id CA]
β β β β β ββ obs{[ 6.28412 ... .0402883 ]} [id CB]
β β β β β ββ ExpandDims{axis=0} [id CC]
β β β β β ββ mu [id L]
β β β β ββ ExpandDims{axis=0} [id CD]
β β β β ββ Exp [id CE]
β β β β ββ sigma_log__ [id BE]
This chunk is computing the kernel of the normal likelihood: \frac{x - \mu}{\sigma}
So this is how a βnormalβ PyMC model works: define a generative graph that produces data, then show it some data, then it works out what the probability of the whole system is, given data and parameters.
pm.Potential
is our way of side-stepping the generative process and working with the logp
graph directly. Suppose we add a potential term to this simple model:
with pm.Model() as m2:
mu = pm.Normal('mu')
sigma = pm.Exponential('sigma', 1)
obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
pm.Potential('positivity_constraint', pm.math.switch(pm.math.ge(obs, 0), 0, -np.inf))
If you go to do prior sampling on this model with e.g. pm.sample_prior_predictive
, youβll get this warning:
UserWarning: The effect of Potentials on other parameters is ignored during prior predictive sampling. This is likely to lead to invalid or biased predictive samples.
This is because the Potential doesnβt live on the generative graph. Our generative model will still produce negative draws of obs
, because the potential only lives on the logp graph. If we check m2.logp().dprint()
we can see it:
Sum{axes=None} [id A] '__logp'
ββ MakeVector{dtype='float64'} [id B]
ββ Sum{axes=None} [id C]
β ββ Check{sigma > 0} [id D] 'mu_logprob'
β ββ Sub [id E]
β β ββ Sub [id F]
β β β ββ Mul [id G]
β β β β ββ -0.5 [id H]
β β β β ββ Pow [id I]
β β β β ββ True_div [id J]
β β β β β ββ Sub [id K]
β β β β β β ββ mu [id L]
β β β β β β ββ 0 [id M]
β β β β β ββ 1.0 [id N]
β β β β ββ 2 [id O]
β β β ββ Log [id P]
β β β ββ Sqrt [id Q]
β β β ββ 6.283185307179586 [id R]
β β ββ Log [id S]
β β ββ 1.0 [id N]
β ββ All{axes=None} [id T]
β ββ MakeVector{dtype='bool'} [id U]
β ββ All{axes=None} [id V]
β ββ Gt [id W]
β ββ 1.0 [id N]
β ββ 0 [id X]
ββ Sum{axes=None} [id Y]
β ββ Add [id Z] 'sigma_log___logprob'
β ββ Check{mu >= 0} [id BA]
β β ββ Switch [id BB]
β β β ββ Ge [id BC]
β β β β ββ Exp [id BD]
β β β β β ββ sigma_log__ [id BE]
β β β β ββ 0.0 [id BF]
β β β ββ Sub [id BG]
β β β β ββ Neg [id BH]
β β β β β ββ Log [id BI]
β β β β β ββ 1.0 [id N]
β β β β ββ True_div [id BJ]
β β β β ββ Exp [id BD]
β β β β β ββ Β·Β·Β·
β β β β ββ 1.0 [id N]
β β β ββ -inf [id BK]
β β ββ All{axes=None} [id BL]
β β ββ MakeVector{dtype='bool'} [id BM]
β β ββ All{axes=None} [id BN]
β β ββ Ge [id BO]
β β ββ 1.0 [id N]
β β ββ 0 [id BP]
β ββ Identity [id BQ] 'sigma_log___log_jacobian'
β ββ sigma_log__ [id BE]
ββ Sum{axes=None} [id BR]
β ββ Check{sigma > 0} [id BS] 'obs_logprob'
β ββ Sub [id BT]
β β ββ Sub [id BU]
β β β ββ Mul [id BV]
β β β β ββ ExpandDims{axis=0} [id BW]
β β β β β ββ -0.5 [id BX]
β β β β ββ Pow [id BY]
β β β β ββ True_div [id BZ]
β β β β β ββ Sub [id CA]
β β β β β β ββ obs{[ 7.73650 ... .62535995]} [id CB]
β β β β β β ββ ExpandDims{axis=0} [id CC]
β β β β β β ββ mu [id L]
β β β β β ββ ExpandDims{axis=0} [id CD]
β β β β β ββ Exp [id CE]
β β β β β ββ sigma_log__ [id BE]
β β β β ββ ExpandDims{axis=0} [id CF]
β β β β ββ 2 [id CG]
β β β ββ ExpandDims{axis=0} [id CH]
β β β ββ Log [id CI]
β β β ββ Sqrt [id CJ]
β β β ββ 6.283185307179586 [id CK]
β β ββ Log [id CL]
β β ββ ExpandDims{axis=0} [id CD]
β β ββ Β·Β·Β·
β ββ All{axes=None} [id CM]
β ββ MakeVector{dtype='bool'} [id CN]
β ββ All{axes=None} [id CO]
β ββ Gt [id CP]
β ββ ExpandDims{axis=0} [id CD]
β β ββ Β·Β·Β·
β ββ ExpandDims{axis=0} [id CQ]
β ββ 0 [id CR]
ββ Cast{float64} [id CS]
ββ Sum{axes=None} [id CT]
ββ Switch [id CU] 'positivity_constraint'
ββ Ge [id CV]
β ββ obs{[ 7.73650 ... .62535995]} [id CB]
β ββ ExpandDims{axis=0} [id CW]
β ββ 0 [id CX]
ββ ExpandDims{axis=0} [id CY]
β ββ 0 [id CZ]
ββ ExpandDims{axis=0} [id DA]
ββ -inf [id DB]
Itβs all the way at the bottom. If you check at the very top, all the terms on this graph are being fed into a Sum{axes=None} [id A] '__logp'
operation. What this means is that when you write a pm.Potential
, whatever you pass into it will be automatically added to the logp
of the model. I did not assign the potential in the 2nd model example to a variable to emphasize this point.
So to make a long story short, pm.Potential
defines a logp term that doesnβt live in the generative model you declare in a βnormalβ PyMC workflow. Evidently, this is used in the pymc-marketing
model you are studying. It doesnβt mean anything is wrong, it just means the devs decided it was more convenient to work directly with the logp terms than with the generative process.