Writing a DensityDist for a Geometric-like likelihood

Hi,

I’m having a hard time writing my own distribution using DensityDist.

I’m trying to model several “geometric-like” random variables, d_i, where each trial’s probability p_j is different and sampled from a Beta distribution. The probability of such a “geometric” random variable is given by:

P(X=n) = p_n \prod_{j=1}^{n-1} (1 - p_j)

where n is the number of trials until a first success. In my case, this number is observed. And, very importantly, the observed_rates are also observed, through ps above. I have several such random variables d_i, each with many p_j sampled from the same beta distribution. Thus, I’m trying to write a model to fit the parameters of the beta distribution, a and b, as well as intercept and weight:

p_j \sim \text{intercept} + \text{weight} \cdot \text{beta}(a, b)
d_i \sim p_n \prod_{j=1}^{n-1} (1 - p_j)

I managed to write the following as a test using a DensityDist (even though it’s not necessary here) with a logp returning a Geometric's logp, and it works:

# Simulate data
ps = [0.1, 0.2, 0.5, 0.6, 0.1]

ns = []
for p in ps:
    ns.append(rng.geometric(p))

# Build model
with pm.Model() as model:
    alpha = pm.HalfNormal("alpha", sigma=1)
    beta = pm.HalfNormal("beta", sigma=1)
    weight = pm.Normal('weight', mu=0, sigma=1)
    intercept = pm.Normal('intercept', mu=0, sigma=1)

    observed_rates = pm.Beta(
        "observed_rates",
        alpha=alpha,
        beta=beta,
        observed=ps,
    )

    rates = pm.math.invlogit(intercept + weight * observed_rates)

    def logp(observed, probabilities):
        return pm.Geometric.logp(observed, p=probabilities)

    likelihood = pm.DensityDist('rates', rates, logp=logp, observed=ns)

    idata = pm.sample()

With help from this community writing the logp function, I managed to write the following model, but it does not work. It does only if I comment out the observed line from the Beta distribution.

# Simulated data: five random variables, d_i, each with different probabilities p_j.
probabilities = [
    [0.0637785, 0.6502736, 0.6969157],
    [0.2021510, 0.0000388, 0.0025692, 0.0000002, 0.1170427, 0.6897584],
    [0.1145798, 0.0003131, 0.7293184, 0.0031975, 0.3469338, 0.0306480,
     0.6284305, 0.4975889, 0.1520346, 0.2352547, 0.1397242, 0.4827984,
     0.0596149, 0.0094145, 0.3882519],
    [0.0575668, 0.3347680],
    [0.0646256]
]

# Number of trials for each of the five random variables, d_i.
n_trials = np.array([len(p) for p in probabilities])
# Flatten the array.
flat_probabilities = np.concatenate(probabilities)

# Custom logp
def vargeometric(n, cursor, rates):
    return pm.math.log(rates[cursor + n - 1]) + pm.math.log(1 - rates[cursor:cursor + n - 1]).sum()

with pm.Model() as model:
    alpha = pm.HalfNormal("alpha", sigma=1)
    beta = pm.HalfNormal("beta", sigma=1)
    weight = pm.Normal('weight', mu=0, sigma=1)
    intercept = pm.Normal('intercept', mu=0, sigma=1)

    observed_rates = pm.Beta(
        "observed_rates",
        alpha=alpha,
        beta=beta,
        observed=flat_probabilities,
        size=(n_trials.sum(), ),
    )

    rates = pm.math.invlogit(intercept + weight * observed_rates)

    def logp(observed, probabilities):
        observed = observed.astype(int).eval()
        cursors = np.r_[[0], observed[:-1]].cumsum()
        # Loop through `probabilities`, using `n_trials` and `cursors` to figure out which
        # elements of it pertains to which random variable from `observed_rates`.
        p_trials, _ = ae.scan(
            vargeometric,
            sequences=[at.as_tensor_variable(observed), at.as_tensor_variable(cursors)],
            non_sequences=[probabilities],
        )
        return p_trials

    def random(rates, rng=None, size=None):
        trials = []
        for i in range(len(n_trials)):
            n = 0
            while True:
                p = rng.beta(alpha.eval(), beta.eval())
                p = sp.special.expit(p * weight.eval() + intercept.eval())
                n += 1
                if rng.binomial(1, p):
                    break
            trials.append(n)
        return np.array(trials)

    likelihood = pm.DensityDist('rates', rates, logp=logp, random=random, observed=n_trials)

    idata = pm.sample()

See below for the full error. Some questions I have:

  1. Any ideas what I’m doing wrong?
  2. Is this overly complicated for what I’m trying to achieve?
  3. Will I run into the same kind of issues trying to write random?
  4. Is there any other way I could model this using ready-made distributions from PyMC instead?

Many thanks.

This is the error I get:

AssertionError: 
Apply node that caused the error: for{cpu,scan_fn&grad_of_scan_fn}(TensorConstant{5}, TensorConstant{[ 2  8 23 25 26]}, TensorConstant{[ 0  3  9 24 26]}, TensorConstant{[26 25 23  8  2]}, TensorConstant{[ 1  2 15  6  3]}, TensorConstant{[26 24  9  3  0]}, TensorConstant{(2, 27) of 0.0}, TensorConstant{5}, Elemwise{Composite{sigmoid((i0 + (i1 * i2)))}}.0)
Toposort index: 15
Inputs types: [TensorType(int64, ()), TensorType(int64, (5,)), TensorType(int64, (5,)), TensorType(int64, (5,)), TensorType(int64, (5,)), TensorType(int64, (5,)), TensorType(float64, (2, 27)), TensorType(int64, ()), TensorType(float64, (None,))]

Debug print of the apply node: 
for{cpu,scan_fn&grad_of_scan_fn}.0 [id A] <TensorType(float64, (None, None))> ''   
 |TensorConstant{5} [id B] <TensorType(int64, ())>
 |TensorConstant{[ 2  8 23 25 26]} [id C] <TensorType(int64, (5,))>
 |TensorConstant{[ 0  3  9 24 26]} [id D] <TensorType(int64, (5,))>
 |TensorConstant{[26 25 23  8  2]} [id E] <TensorType(int64, (5,))>
 |TensorConstant{[ 1  2 15  6  3]} [id F] <TensorType(int64, (5,))>
 |TensorConstant{[26 24  9  3  0]} [id G] <TensorType(int64, (5,))>
 |TensorConstant{(2, 27) of 0.0} [id H] <TensorType(float64, (2, 27))>
 |TensorConstant{5} [id B] <TensorType(int64, ())>
 |Elemwise{Composite{sigmoid((i0 + (i1 * i2)))}} [id I] <TensorType(float64, (None,))> ''   
   |InplaceDimShuffle{x} [id J] <TensorType(float64, (1,))> ''   
   | |intercept [id K] <TensorType(float64, ())>
   |InplaceDimShuffle{x} [id L] <TensorType(float64, (1,))> ''   
   | |weight [id M] <TensorType(float64, ())>
   |observed_rates{[6.3778450..25607e-02]} [id N] <TensorType(float64, (27,))>
for{cpu,scan_fn&grad_of_scan_fn}.1 [id A] <TensorType(float64, (None,))> 'rates_logprob'   

Inner graphs:

for{cpu,scan_fn&grad_of_scan_fn}.0 [id A] <TensorType(float64, (None, None))> ''   
 >Elemwise{add,no_inplace} [id O] <TensorType(float64, (None,))> ''   
 > |Elemwise{add,no_inplace} [id P] <TensorType(float64, (None,))> ''   
 > | |IncSubtensor{Inc;int64} [id Q] <TensorType(float64, (None,))> ''   
 > | | |TensorConstant{(27,) of 0.0} [id R] <TensorType(float64, (27,))>
 > | | |Elemwise{true_div} [id S] <TensorType(float64, ())> ''   
 > | | | |Subtensor{int64} [id T] <TensorType(float64, ())> ''   
 > | | | | |TensorConstant{(5,) of 1.0} [id U] <TensorType(float64, (5,))>
 > | | | | |ScalarConstant{0} [id V] <int64>
 > | | | |Subtensor{int64} [id W] <TensorType(float64, ())> ''   
 > | | |   |<TensorType(float64, (None,))> [id X] <TensorType(float64, (None,))> -> [id I]
 > | | |   |ScalarFromTensor [id Y] <int64> ''   
 > | | |     |<TensorType(int64, ())> [id Z] <TensorType(int64, ())> -> [id E]
 > | | |ScalarFromTensor [id Y] <int64> ''   
 > | |IncSubtensor{Inc;int64:int64:} [id BA] <TensorType(float64, (None,))> ''   
 > |   |TensorConstant{(27,) of 0.0} [id R] <TensorType(float64, (27,))>
 > |   |Elemwise{neg} [id BB] <TensorType(float64, (None,))> ''   
 > |   | |Elemwise{true_div} [id BC] <TensorType(float64, (None,))> ''   
 > |   |   |Elemwise{second} [id BD] <TensorType(float64, (None,))> ''   
 > |   |   | |Elemwise{log,no_inplace} [id BE] <TensorType(float64, (None,))> ''   
 > |   |   | | |Elemwise{sub,no_inplace} [id BF] <TensorType(float64, (None,))> ''   
 > |   |   | |   |TensorConstant{(1,) of 1} [id BG] <TensorType(int8, (1,))>
 > |   |   | |   |Subtensor{int64:int64:} [id BH] <TensorType(float64, (None,))> ''   
 > |   |   | |     |<TensorType(float64, (None,))> [id X] <TensorType(float64, (None,))> -> [id I]
 > |   |   | |     |ScalarFromTensor [id BI] <int64> ''   
 > |   |   | |     | |<TensorType(int64, ())> [id BJ] <TensorType(int64, ())> -> [id G]
 > |   |   | |     |ScalarFromTensor [id BK] <int64> ''   
 > |   |   | |       |<TensorType(int64, ())> [id Z] <TensorType(int64, ())> -> [id E]
 > |   |   | |Subtensor{int64} [id BL] <TensorType(float64, (1,))> ''   
 > |   |   |   |TensorConstant{(5, 1) of 1.0} [id BM] <TensorType(float64, (5, 1))>
 > |   |   |   |ScalarConstant{0} [id BN] <int64>
 > |   |   |Elemwise{sub,no_inplace} [id BF] <TensorType(float64, (None,))> ''   
 > |   |ScalarFromTensor [id BI] <int64> ''   
 > |   |ScalarFromTensor [id BK] <int64> ''   
 > |<TensorType(float64, (None,))> [id BO] <TensorType(float64, (None,))> -> [id H]
 >Elemwise{add,no_inplace} [id BP] <TensorType(float64, ())> ''   
 > |Elemwise{log,no_inplace} [id BQ] <TensorType(float64, ())> ''   
 > | |Subtensor{int64} [id BR] <TensorType(float64, ())> ''   
 > |   |<TensorType(float64, (None,))> [id X] <TensorType(float64, (None,))> -> [id I]
 > |   |ScalarFromTensor [id BS] <int64> ''   
 > |     |<TensorType(int64, ())> [id BT] <TensorType(int64, ())> -> [id C]
 > |Sum{acc_dtype=float64} [id BU] <TensorType(float64, ())> ''   
 >   |Elemwise{log,no_inplace} [id BV] <TensorType(float64, (None,))> ''   
 >     |Elemwise{sub,no_inplace} [id BW] <TensorType(float64, (None,))> ''   
 >       |TensorConstant{(1,) of 1} [id BG] <TensorType(int8, (1,))>
 >       |Subtensor{int64:int64:} [id BX] <TensorType(float64, (None,))> ''   
 >         |<TensorType(float64, (None,))> [id X] <TensorType(float64, (None,))> -> [id I]
 >         |ScalarFromTensor [id BY] <int64> ''   
 >         | |<TensorType(int64, ())> [id BZ] <TensorType(int64, ())> -> [id D]
 >         |ScalarFromTensor [id CA] <int64> ''   
 >           |<TensorType(int64, ())> [id BT] <TensorType(int64, ())> -> [id C]

for{cpu,scan_fn&grad_of_scan_fn}.1 [id A] <TensorType(float64, (None,))> 'rates_logprob'   
 >Elemwise{add,no_inplace} [id O] <TensorType(float64, (None,))> ''   
 >Elemwise{add,no_inplace} [id BP] <TensorType(float64, ())> ''   

HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

LIbrary versions:

pymc == 4.0.0b6
aesara == 2.5.1
numpy == 1.19.5