Hi.
I’m a newbie in PyMC an am trying to model a problem:
- There are K sheep that are counting humans, where K is a RV from Bayesian inference
- Each individual sheep has a max human counter before they fall asleep, this is stored in the vector t \in \mathbb{R}^K
- Sheep are unobservant, so they have a probability p_c of detecting a human passing by and incrementing their count
- There are l humans passing by in total
- The matrix c\in \mathbb{R}^{l\times K} keeps track of each sheeps counter at the point of each humans passing
- For each passing, the amount of sheep falling asleep is observed and stored in a data set, and it is deterministically given by M_j=\sum_{i=1}^{K}{I(c_{ij}=t_i)}, where I is the indicator function. M_j is then used as a mean in a Binomial distribution into which the observations are passed
- The goal is to do inference on the distribution of max human counters t
The desired behaviour in more traditional Python code is given by:
m = [24, 14, 5, 2, 0, 0] # Observed data (amount of sheep falling asleep in each passing)
l = len(m) # Total amount of humans passed by
c_param = 1/20 # Parameter for a priori geometric distribution on max counter for each sheep
p_c = 0.4 # Detection probability
# Parameters for Binomial distribution approximating the normal:
p = lambda mu, sd: 1 - sd**2/mu
n = lambda mu, sd: mu / p(mu, sd)
sd = 1.2 # Standard deviation to be used in said distribution
# For the sampler:
n_samples = 5_000
tune = 2_500
with pm.Model() as max_counter_model:
# Amount of sheep K:
K = pm.Geometric("K", 1/5_000)
p_c = pm.Beta("p_c", alpha=11.20, beta=26.13) # Assuming the encounter probability to be centered around 0.3
like = pm.Binomial("like", p=p_c, n=K, observed=np.sum(m))
# Human counter:
t = pm.Geometric("t", c_param, size=K) # Max counter for each sheep
c = [] # Counter for each sheep at each human passing
for j in range(1, l+1):
c.append( pm.Binomial(f"c_{j}", p=p_c, n=j, size=K) )
m_det = at.sum(at.eq(c[j-1], t))
like = pm.Binomial(f"like_{j}", p=p(m_det, sd), n=n(m_det, sd), observed=m[j-1])
idata = pm.sample(draws=n_samples, tune=tune)
but this obviously does not work.
My questions are:
1
How can I define c as a tensor and broadcast in each human passing? I want to achieve something like
pm.Binomial(f"c", p=p_c, n=at.arange(1, l+1), size=(l, K))
such that each row in C is a K-dimensional vector of binomially distributed variables. Defining it like above yields a broadcasting error, defining it as the transpose gives some shape error down the line.
2
Say I do manage to define c appropriately. How can I iterate through the rows and calculate M_j? Do I have to use the aesara.scan
function or can I do it with a regular for-loop somehow?
3
Is it even possible to do inference this way using M_j as a mean in a binomial likelihood. The reason I’m doing it is because I know that observed deterministics are not supported, and this is my workaround. I have no idea if it works though.
I would really appreciate any help or someone just telling me it’s impossible. Critizism to my PyMC skills are also very welcome!
Thank you for reading