Model with high-dimensional stochastic vectors and broadcasting in distributions

olerid · July 25, 2022, 1:59pm

Hi.

I’m a newbie in PyMC an am trying to model a problem:

There are K sheep that are counting humans, where K is a RV from Bayesian inference
Each individual sheep has a max human counter before they fall asleep, this is stored in the vector t \in \mathbb{R}^K
Sheep are unobservant, so they have a probability p_c of detecting a human passing by and incrementing their count
There are l humans passing by in total
The matrix c\in \mathbb{R}^{l\times K} keeps track of each sheeps counter at the point of each humans passing
For each passing, the amount of sheep falling asleep is observed and stored in a data set, and it is deterministically given by M_j=\sum_{i=1}^{K}{I(c_{ij}=t_i)}, where I is the indicator function. M_j is then used as a mean in a Binomial distribution into which the observations are passed
The goal is to do inference on the distribution of max human counters t

The desired behaviour in more traditional Python code is given by:

m = [24, 14, 5, 2, 0, 0] # Observed data (amount of sheep falling asleep in each passing)
l = len(m)               # Total amount of humans passed by

c_param = 1/20  # Parameter for a priori geometric distribution on max counter for each sheep
p_c = 0.4       # Detection probability

# Parameters for Binomial distribution approximating the normal:
p = lambda mu, sd: 1 - sd**2/mu
n = lambda mu, sd: mu / p(mu, sd)

sd = 1.2 # Standard deviation to be used in said distribution

# For the sampler:
n_samples = 5_000
tune = 2_500

with pm.Model() as max_counter_model:
    # Amount of sheep K:
    K = pm.Geometric("K", 1/5_000)
    p_c = pm.Beta("p_c", alpha=11.20, beta=26.13) # Assuming the encounter probability to be centered around 0.3
    like = pm.Binomial("like", p=p_c, n=K, observed=np.sum(m))

    # Human counter:
    t = pm.Geometric("t", c_param, size=K) # Max counter for each sheep
    c = []                                 # Counter for each sheep at each human passing

    for j in range(1, l+1):
        c.append( pm.Binomial(f"c_{j}", p=p_c, n=j, size=K) )
        m_det = at.sum(at.eq(c[j-1], t))
        like = pm.Binomial(f"like_{j}", p=p(m_det, sd), n=n(m_det, sd), observed=m[j-1])

    idata = pm.sample(draws=n_samples, tune=tune)

but this obviously does not work.

My questions are:

1
How can I define c as a tensor and broadcast in each human passing? I want to achieve something like

pm.Binomial(f"c", p=p_c, n=at.arange(1, l+1), size=(l, K))

such that each row in C is a K-dimensional vector of binomially distributed variables. Defining it like above yields a broadcasting error, defining it as the transpose gives some shape error down the line.

2
Say I do manage to define c appropriately. How can I iterate through the rows and calculate M_j? Do I have to use the aesara.scan function or can I do it with a regular for-loop somehow?

3
Is it even possible to do inference this way using M_j as a mean in a binomial likelihood. The reason I’m doing it is because I know that observed deterministics are not supported, and this is my workaround. I have no idea if it works though.

I would really appreciate any help or someone just telling me it’s impossible. Critizism to my PyMC skills are also very welcome!

Thank you for reading

ricardoV94 · July 25, 2022, 7:17pm

Regarding point 1 alone, does this guide help? Distribution Dimensionality — PyMC 4.1.3 documentation

olerid · July 26, 2022, 7:41am

Yes, this helped a lot. Thank you!
My issue now is that m_det is usually zero, which pm.Binomial appearently does not like. The code runs when I set c_param above ~0.6 and exclude the evaluation of binomial likelihoodlike_1 (which usually has the smallest observed value).

olerid · July 26, 2022, 8:31am

I figured out the answer to question 2 and 3:
2. Iterating it in a for-loop like in the code should be possible, but I ran into a lot of issues regarding invalid sampling and -inf likelihoods in the binomial distribution in the for loop
3. Yes, yes it is (or at least it seems so)

My main problem now is the stochasticity of K, which I made a new thread for.

My modified code which (mostly) works:

m = [10, 7, 4]
l = len(m)

c_param = 0.7  # Parameter for a priori geometric distribution on max counter for each sheep
p_c = 0.4 # Encounter probability

# Parameters for Binomial distribution approximating the normal:
p = lambda mu, sd: 1 - sd**2/mu
n = lambda mu, sd: mu / p(mu, sd)

sd = 5 # Standard deviation to be used in said distribution

# For the sampler:
n_samples = 5_000
tune = 2_500

with pm.Model() as max_counter_model:
    # Amount of sheep K:
    # K = pm.Geometric("K", 1/10)
    K = 25 # Fixing K as a temporary solution
    p_c = pm.Beta("p_c", alpha=11.20, beta=26.13) # Assuming the encounter probability to be centered around 0.3
    like = pm.Binomial("like", p=p_c, n=K, observed=np.sum(m))

    # Human counter:
    t = pm.Geometric("t", c_param, size=(K,))                      # Max counter for each sheep
    c = pm.Binomial(f"c", p=p_c, n=at.arange(1, l+1), size=(K, l)) # Counter for each sheep at each passing

    for j in range(l, 1, -1): # Exclueded l=1 temporarily because it yields some error related to the binomial distribution
        m_det = pm.Deterministic(f"m_det_{j}", at.sum(at.eq(c[:, j-1], t)))
        like = pm.Binomial(f"like_{j}", p=p(m_det, sd), n=n(m_det, sd), observed=m[j-1])

    idata = pm.sample(draws=n_samples, tune=tune)

Topic		Replies	Views
Dependent histogram count data v5 modeling	10	514	November 23, 2022
Building an abudance and detection probability model in PyMC3 Questions	6	436	April 29, 2021
Feeding counts or frequencies as observed data for a categorical variable v5 modeling	6	83	April 28, 2025
Help with Matrix-Based Inference (A toy problem) modeling	6	100	January 14, 2025
Statistical Rethinking Ch 3 Q1 Code / observed = data errors / beginner question v3 modeling	6	443	June 2, 2022

Model with high-dimensional stochastic vectors and broadcasting in distributions

Related topics