Thank you for all of the help.
I am attaching a toy example to illustrate what I believe to be are problems with this approach. Note that t, n, and p are assumed to be known, i.e. we know the time in seconds of sampling, the number of people sampled, and the true number of people in the population. What we do not know is the number of viewing seconds (viewing_seconds) derived from the sample and the true average proportion of the population for each event ®.
toy_events_data.csv (2.1 KB)
toy_example.py (2.8 KB)
From the correlation plot, we see some strong correlation between n and viewing_seconds since t and p are essentially constant.
So I can use either the recorded viewing seconds or n and set up a Poisson likelihood distribution with rate r * p * t and place a Gamma prior on r (I have some prior knowledge on the behavior of these r’s).
The problem comes from interpreting the trace for the r value after sampling with Metropolis. If I use the measured viewing seconds, I get the correct interpretation of r, the population proportion in attendance on average, but there is no uncertainty around these values. If I instead use n as the observed, I get the uncertainty around r, but in the ppc, I lose the ability to understand the inferred viewing seconds of the event since n is not directly proportional to the viewing seconds. Worse, I lose the interpretation of the r value.
I would prefer if it is somehow possible to use the measured viewing seconds in the likelihood but place uncertainty around it proportional to the number of people used in the sample.
To clarify, in the first two records, I have r values of 0.037624706 and 0.014726273. But the first r value was measured with a sample of 148 people as compared to 44 people for the second. I want to encapsulate the belief that I trust that r value measured using 148 people more than the one measured using 44 people.
Does that make sense?