Varying number of probabilities for Geometric distribution or dealing with ragged matrices

jessegrabowski · May 10, 2022, 3:53am

I think my example was unhelpful, I just meant to show how one can implement the scan in Aesara using dummy variables. When you actually implement it, you don’t want to compile the Aesara function first, otherwise you might as well just write it in Numpy as skip the headache. I guess the difference between pm.Potential and pm.DensityDist is that one takes a symbolic tensor and one takes a numpy function? I’m not sure. It was quite late when I posted yesterday; using pm.Potential didn’t occur to me. Taking this approach, here’s what I came up with:

import numpy as np
import aesara
import aesara.tensor as at
import pymc as pm
from scipy import stats
import arviz as az

def gen_geometric_rvs(rates, n_rv):
    n_trials = np.full(n_rv, np.nan)
    idx = 0

    for i in range(n_rv):
        ps = []
        for n, p in enumerate(rates[idx:], start=1):
            is_success = stats.bernoulli(p=p).rvs()
            idx += 1
            if is_success:
                n_trials[i] = n
                break
            
    return n_trials.astype('int')

def vargeometric(n, cursor, rates):
    return pm.math.log(rates[cursor + n - 1]) + pm.math.log(1 - rates[cursor:cursor+n - 1]).sum()

true_alpha = 0.5
true_beta = 1.4

# I draw many more rates than needed to make sure gen_geometric_rvs doesn't have any nans
many_rates = stats.beta(a=true_alpha, b=true_beta).rvs(100)

n_trials = gen_geometric_rvs(many_rates, 5)
cursors = np.r_[[0], n_trials[:-1]].cumsum()

#Recover only the rates that were used to compute trials to compare with the MCMC result
true_rates = [many_rates[cursor:cursor + n] for n, cursor in zip(n_trials, cursors)]

with pm.Model() as model:
    alpha = pm.HalfNormal("alpha", sigma=1)
    beta = pm.HalfNormal("beta", sigma=1)

    rates = pm.Beta(
        "rates",
        alpha=alpha,
        beta=beta,
        size=(n_trials.sum(), ),
    )
    
    logp_array, _ = aesara.scan(vargeometric,
                            sequences=[at.as_tensor_variable(n_trials), at.as_tensor_variable(cursors)],
                            non_sequences=rates)

    likelihood = pm.Potential('likelihood', logp_array.sum())

    trace = pm.sample(cores=6)

Here’s what I got as results.

Topic		Replies	Views
Writing a DensityDist for a Geometric-like likelihood v5 modeling	0	344	May 16, 2022
Mixture of two beta geometric with beta shape > 1 Questions	6	1157	January 18, 2018
Geometric-Beta mixture? Questions	0	400	January 28, 2021
How to get the probabilities of this PyMC3 model? Questions	3	827	March 9, 2020
Specifying hierarchical Binomial model when individuals have varying numbers of trials? Questions	6	875	September 23, 2019

Varying number of probabilities for Geometric distribution or dealing with ragged matrices

Related topics