pm.DensityDist problem under PMYC 4

zweli · June 12, 2022, 5:34pm

Dear All,

I upgraded to PYMC 4 and I am having a problem with DensityDist. I read the new documentation and tried to rewrite in the new API but I just cannot get it to work. I was wondering if anyone could help. This is an implementation of the Fader/Hardie CLV model that ran fine under PYMC3. I get an error re: using a dictionary.

The code is below:

bgnbd_model = pm.Model()

with bgnbd_model:
    
    # Hyperparameters for lambda, i.e. rate of number of transactions
    sigma_r = pm.HalfNormal('sigma_r', 5.)
    sigma_alpha = pm.HalfNormal('sigma_alpha', 5.)
    r = pm.HalfNormal('r', sd=sigma_r)
    alpha = pm.HalfNormal('alpha', sd=sigma_alpha)

    # Hyperparameters for p, i.e. probability of dropout
    sigma_a = pm.HalfNormal('sigma_a', 5.)
    sigma_b = pm.HalfNormal('sigma_b', 5.)
    a = pm.HalfNormal('a', sd=sigma_a)
    b = pm.HalfNormal('b', sd=sigma_b)

    # Gamma prior on purchasing rate parameter lambda
    # Note: per Fader et al, we use r and alpha to parameterize Gamma(lambda)
    lambda_ = pm.Gamma('lambda', alpha=r, beta=alpha, shape=N, testval=np.random.rand(N))
    
    # Beta prior on probabibility of dropout p
    p = pm.Beta('p', alpha=a, beta=b, shape=N, testval=np.random.rand(N))
    
    def logp(x, t_x, T):
        """
        Loglikelihood function for an individual customer's purchasing rate \lambda
        and probability \p of becoming inactive given their 
            - frequency (x)
            - recency (t_x) 
            - time since first purchase (T)
        """    
        delta_x = where(x>0, 1, 0)
        
        A1 = x*log(1-p) + x*log(lambda_) - lambda_*T
        A2 = (log(p) + (x-1)*log(1-p) + x*log(lambda_) - lambda_*t_x)

        A3 = log(exp(A1) + delta_x * exp(A2))
        
        return A3
    
    # Custom distribution for BG-NBD likelihood function
    loglikelihood = pm.DensityDist("loglikelihood", logp, observed={'x': x, 't_x': t_x, 'T': T})

    trace = pm.sample(draws=1_000, return_inferencedata=True, cores=4)

ricardoV94 · June 12, 2022, 6:30pm

Try:

loglikelihood = pm.DensityDist("loglikelihood", t_x, T, logp=logp, observed=x)

The observed dictionary is no longer a thing.

zweli · June 12, 2022, 8:32pm

Thanks, @ricardoV94. I tried it got this error:

“ValueError: Random variables detected in the logp graph: [p, a, sigma_a, b, sigma_b, lambda, r, sigma_r, alpha, sigma_alpha].
This can happen when DensityDist logp or Interval transform functions reference nonlocal variables.”

I have attached the screenshot.

ricardoV94 · June 12, 2022, 8:55pm

That happens because your logp is referencing p but that is not passed as an argument to the function.

You have to make p an input to the logp function and the DensityDist like you do for t_x and T

zweli · June 12, 2022, 9:24pm

I added p and lambda_ to the logo function, as you suggested (thank you). It starts sampling and then crashes because the loglikelihood evaluates to NaN.

ricardoV94 · June 12, 2022, 9:37pm

You have to be careful as the arguments are passed by order and not name to the logp function.

The observed value is always the first and then they should follow the order of the arguments passed into the DensityDist call.

zweli · June 12, 2022, 9:54pm

You called it. Thank you so much. Sampling now!

Erlebach · June 13, 2022, 8:44pm

On a similar topic, I have the function

with pm.Model() as model:
    # The DP priors to obtain w, the cluster weights
    alpha = pm.Gamma('alpha', 1., 1.)
    beta = pm.Beta('beta', 1, alpha, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))
    theta = pm.Beta('theta', 1, 1, shape=(K, P))
    
    def bernoulli_loglh(theta1, weights):
        def _logp(value):
            value_neg = 1 - value
            logtheta = tt.log(theta1)
            neglogtheta = tt.log(1-theta1)

            # N*K matrix of likelihood contributions from X_i with mu_k
            betas = tt.tensordot(value, logtheta, axes=[1, 1]) + tt.tensordot(value_neg, neglogtheta, axes=[1, 1])

            ## Form alphas, NxK matrix with the component weights included
            alphas = (betas + tt.log(weights)).T

            # Take LSE rowise to get N vector
            # and add alpha_cluster1 to get the total likelihood across all
            # components for each X_i
            lse_clust = pm.math.logsumexp(alphas - alphas[0, :], axis=0) + alphas[0,:]

            # Final overall sum
            return tt.sum(lse_clust)
        return _logp

    obs = pm.DensityDist('obs', w, logp=bernoulli_loglh, observed=theta1)

constructed based on the above thread. I get the error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [26], in <cell line: 1>()
     26         return tt.sum(lse_clust)
     27     return _logp
---> 29 obs = pm.DensityDist('obs', w, logp=bernoulli_loglh, observed=theta1)

NameError: name 'theta1' is not defined

How does one fix this error? Thanks!

cluhmann · June 13, 2022, 8:53pm

I think that error is what it says. The variable theta1 is not defined. You need to define something to pass in as observed.

Erlebach · June 13, 2022, 9:00pm

Thanks. I guess I will rephrase. I am trying to use a code written for pymc3 to demonstrate a mixture of Bernoulli’s (GitHub - stulacy/DPBernoulliMixture: Bayesian modelling of Infinite Bernoulli Mixtures using Dirichlet Processes in pymc3). Here is the code written for pymc3:

def bernoulli_loglh(theta, weights):
    def _logp(value):
        value_neg = 1 - value
        logtheta = tt.log(theta)
        neglogtheta = tt.log(1-theta)

        # N*K matrix of likelihood contributions from X_i with mu_k
        betas = tt.tensordot(value, logtheta, axes=[1, 1]) + tt.tensordot(value_neg, neglogtheta, axes=[1, 1])

        ## Form alphas, NxK matrix with the component weights included
        alphas = (betas + tt.log(weights)).T

        # Take LSE rowise to get N vector
        # and add alpha_cluster1 to get the total likelihood across all
        # components for each X_i
        lse_clust = pm.math.logsumexp(alphas - alphas[0, :], axis=0) + alphas[0,:]

        # Final overall sum
        return tt.sum(lse_clust)
    return _logp


with pm.Model() as model:
    # The DP priors to obtain w, the cluster weights
    alpha = pm.Gamma('alpha', 1., 1.)
    beta = pm.Beta('beta', 1, alpha, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))

    theta = pm.Beta('theta', 1, 1, shape=(K, P))
    obs = pm.DensityDist('obs', bernoulli_loglh(theta, w),
                         observed=df)

According to what I read, the new way to write the DensityDist for this case is:

obs = pm.DensityDist('obs', w, bernoulli_loglh, observed=df)

Is this correct? What happens to the first argument of bernoulli_loglh? Thanks.

Erlebach · June 13, 2022, 9:26pm

@cluhmann : I rewrote my routine as follows, and it compiles. However, when sampling the model, I have an error output: `

with pm.Model() as model:
    # The DP priors to obtain w, the cluster weights
    alpha = pm.Gamma('alpha', 1., 1.)
    beta = pm.Beta('beta', 1, alpha, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))
    theta = pm.Beta('theta', 1, 1, shape=(K, P))
    
    def bernoulli_loglhh(theta1, weights):
        def _logp(value):
            value_neg = 1 - value
            logtheta = tt.log(theta1)
            neglogtheta = tt.log(1-theta1)

            # N*K matrix of likelihood contributions from X_i with mu_k
            betas = tt.tensordot(value, logtheta, axes=[1, 1]) + tt.tensordot(value_neg, neglogtheta, axes=[1, 1])

            ## Form alphas, NxK matrix with the component weights included
            alphas = (betas + tt.log(weights)).T

            # Take LSE rowise to get N vector
            # and add alpha_cluster1 to get the total likelihood across all
            # components for each X_i
            lse_clust = pm.math.logsumexp(alphas - alphas[0, :], axis=0) + alphas[0,:]

            # Final overall sum
            return tt.sum(lse_clust)
        return _logp

    obs = pm.DensityDist('obs', w, logp=bernoulli_loglhh, observed=df)  # no error without this line

and the models’ execution:

with model:
    trace2 = pm.sample(random_seed=17)

which generates the following error message (Here is the last line:

AttributeError: 'function' object has no attribute 'owner'

FULL ERROR MESSAGE:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [55], in <cell line: 1>()
      1 with model:
----> 2     trace2 = pm.sample(random_seed=17)

File ~/anaconda3/envs/pymc4/lib/python3.9/site-packages/pymc/sampling.py:524, in sample(draws, step, init, n_init, initvals, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, **kwargs)
    521         auto_nuts_init = False
    523 initial_points = None
--> 524 step = assign_step_methods(model, step, methods=pm.STEP_METHODS, step_kwargs=kwargs)
    526 if isinstance(step, list):
    527     step = CompoundStep(step)

File ~/anaconda3/envs/pymc4/lib/python3.9/site-packages/pymc/sampling.py:207, in assign_step_methods(model, step, methods, step_kwargs)
    204 # Use competence classmethods to select step methods for remaining
    205 # variables
    206 selected_steps = defaultdict(list)
--> 207 model_logpt = model.logpt()
    209 for var in model.value_vars:
    210     if var not in assigned_vars:
    211         # determine if a gradient can be computed

File ~/anaconda3/envs/pymc4/lib/python3.9/site-packages/pymc/model.py:745, in Model.logpt(self, vars, jacobian, sum)
    743 rv_logps: List[TensorVariable] = []
    744 if rv_values:
--> 745     rv_logps = joint_logpt(list(rv_values.keys()), rv_values, sum=False, jacobian=jacobian)
    746     assert isinstance(rv_logps, list)
    748 # Replace random variables by their value variables in potential terms

File ~/anaconda3/envs/pymc4/lib/python3.9/site-packages/pymc/distributions/logprob.py:238, in joint_logpt(var, rv_values, jacobian, scaling, transformed, sum, **kwargs)
    234 # Raise if there are unexpected RandomVariables in the logp graph
    235 # Only SimulatorRVs are allowed
    236 from pymc.distributions.simulator import SimulatorRV
--> 238 unexpected_rv_nodes = [
    239     node
    240     for node in aesara.graph.ancestors(list(temp_logp_var_dict.values()))
    241     if (
    242         node.owner
    243         and isinstance(node.owner.op, RandomVariable)
    244         and not isinstance(node.owner.op, SimulatorRV)
    245     )
    246 ]
    247 if unexpected_rv_nodes:
    248     raise ValueError(
    249         f"Random variables detected in the logp graph: {unexpected_rv_nodes}.\n"
    250         "This can happen when DensityDist logp or Interval transform functions "
    251         "reference nonlocal variables."
    252     )

File ~/anaconda3/envs/pymc4/lib/python3.9/site-packages/pymc/distributions/logprob.py:238, in <listcomp>(.0)
    234 # Raise if there are unexpected RandomVariables in the logp graph
    235 # Only SimulatorRVs are allowed
    236 from pymc.distributions.simulator import SimulatorRV
--> 238 unexpected_rv_nodes = [
    239     node
    240     for node in aesara.graph.ancestors(list(temp_logp_var_dict.values()))
    241     if (
    242         node.owner
    243         and isinstance(node.owner.op, RandomVariable)
    244         and not isinstance(node.owner.op, SimulatorRV)
    245     )
    246 ]
    247 if unexpected_rv_nodes:
    248     raise ValueError(
    249         f"Random variables detected in the logp graph: {unexpected_rv_nodes}.\n"
    250         "This can happen when DensityDist logp or Interval transform functions "
    251         "reference nonlocal variables."
    252     )

File ~/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/graph/basic.py:845, in ancestors(graphs, blockers)
    842     if r.owner and (not blockers or r not in blockers):
    843         return reversed(r.owner.inputs)
--> 845 yield from cast(Generator[Variable, None, None], walk(graphs, expand, False))

File ~/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/graph/basic.py:808, in walk(nodes, expand, bfs, return_children, hash_fn)
    804 if node_hash not in rval_set:
    806     rval_set.add(node_hash)
--> 808     new_nodes: Optional[Iterable[T]] = expand(node)
    810     if return_children:
    811         yield node, new_nodes

File ~/anaconda3/envs/pymc4/lib/python3.9/site-packages/aesara/graph/basic.py:842, in ancestors.<locals>.expand(r)
    841 def expand(r: Variable) -> Optional[Iterator[Variable]]:
--> 842     if r.owner and (not blockers or r not in blockers):
    843         return reversed(r.owner.inputs)

AttributeError: 'function' object has no attribute 'owner'

Erlebach · June 14, 2022, 1:17am

Here is the correct way to create the custom distribution in pymc version 4 (there are insufficient examples available):

def _logp(value, theta, weights):
    value_neg = 1 - value
    logtheta = tt.log(theta)
    neglogtheta = tt.log(1-theta)

    # N*K matrix of likelihood contributions from X_i with mu_k
    betas = tt.tensordot(value, logtheta, axes=[1, 1]) + tt.tensordot(value_neg, neglogtheta, axes=[1, 1])

    ## Form alphas, NxK matrix with the component weights included
    alphas = (betas + tt.log(weights)).T

    # Take LSE rowise to get N vector
    # and add alpha_cluster1 to get the total likelihood across all
    # components for each X_i
    lse_clust = pm.math.logsumexp(alphas - alphas[0, :], axis=0) + alphas[0,:]

    # Final overall sum
    return tt.sum(lse_clust)

with a call to DensityDist:

with pm.Model() as model:
    # The DP priors to obtain w, the cluster weights
    alpha = pm.Gamma('alpha', 1., 1.)
    beta = pm.Beta('beta', 1, alpha, shape=K)
    w = pm.Deterministic('w', stick_breaking(beta))
    theta = pm.Beta('theta', 1, 1, shape=(K, P))
    
    obs = pm.DensityDist('obs', theta, w, logp=_logp, observed=df)

cluhmann · June 14, 2022, 1:33am

Your “outer” function (i.e., bernoulli_loglhh) is returning the “inner” function (i.e., _logp) as its return value. It needs to be returning a log (posterior) probability. Maybe this quick vignette will help?

import pymc as pm
import aesara.tensor as at

data = [-1, 0, 1]

# logp function for custom density
# note that the "observed" value of this distribution
# (i.e., `value`) comes first, followed by our extra argument
def my_logp(x, second_argument):
    return second_argument + -1.5 * at.log(1 + x**2)


with pm.Model() as model:
    # ordinary variable
    additional_param = pm.Uniform('additional_param', 0, 1)
    
    # variable following custom density
    custom = pm.DensityDist('custom', additional_param, logp=my_logp)
    
    # likelihood
    like = pm.Normal('like', mu=custom, sigma=1, observed=data)

    idata = pm.sample()

So the sampler will generate a candidate value for additional_param and a separate candidate value for custom. These candidate values will be fed to my_logp(), with x acting as the candidate value of custom and second_argument acting as the candidate value of additional_param. The return value of my_logp() will provide the log posterior of the candidate value for custom (pm.Uniform will provide the log posterior of its candidate value). The likelihood will provide the likelihood of the data.

I think I got all that. Someone can correct me if the above is off. The use of DensityDist complicated because it’s so flexible.

Erlebach · June 14, 2022, 2:00am

This looks correct. I do have one question: shouldn’t pm.DensityDist have an observed argument, which corresponds to the first argument of my_logp? How else would my_logp know what to use for its first argument? Thanks.

cluhmann · June 14, 2022, 2:03am

From above:

Erlebach · June 14, 2022, 1:11pm

Yes, true. I was talking about the missing “observed” argument in the call to DensityDist.

cluhmann · June 14, 2022, 1:22pm

The observed kwarg is a kwarg passed when creating a DensityDist variable. It is not a kwarg passed to the user-defined logp function. So this still works:

custom = pm.DensityDist('custom',
                        additional_param,
                        logp=my_logp,
                        observed=data)

And the observed data is still passed to the user-defined logp function as the first, unnamed argument. The logp function does not know whether value passed in (x in the my_logp() function I defined above) is “observed data” or is a value sampled from some latent variable (e.g., a value drawn from the additional_param parameter as above) and it shouldn’t really care.

Erlebach · June 14, 2022, 4:13pm

Thanks. Makes sense.

Topic		Replies	Views
Rewriting Likelihood with DensityDist() outputs MissingInputError Questions	9	823	April 3, 2021
Drawing values from custom distribution Questions	8	993	September 14, 2022
Trouble using DensityDist with a subclassed Op v5	6	770	April 29, 2022
Creating custom likelihoods with DensityDist - what am I doing wrong? Questions	1	854	November 5, 2017
Custom distribution by pm.DensityDist not coincide with built-in distribution in PyMC4 v5 modeling	2	389	July 27, 2022

pm.DensityDist problem under PMYC 4

Related topics