Combining different aggregations in one model

I have data that is trying to predict on a given day how much of something we will have.

We have how many things we put into progress, and we want to learn the success failure rate, as well as the general amount per individual. In a sense we are trying to learn

y-estimate = N * failure_rate * individual_estimate

N - number of things
failure_rate - how many will get to the finish line
individual_estimate - what is the average value of these things

I want to learn:

  1. failure rate as a binomial (aggregating the successes and failures)
  2. individual estimate by look at each element (removing failures)

I am having trouble understanding if you can learn individual model parameters and aggregate them into one larger model with different aggregations. Thoughts?

I think you can formulate this as a logistic regression? In that case, y-estimate = N * failure_rate * individual_estimate becomes y-estimate = Binomial(N, exp(log_failure_rate + log_individual_estimate)) where log_failure_rate, log_individual_estimate are unknown parameters that you put prior on, and combined in a linear equation.
It will make clear if you can write down the simulation process in numpy etc.

1 Like

Thank you. I am not sure I follow completely. I tried writing this in a numpy process and I didn’t get the correct estimate. So to clarify in the above problem:

N = 10
individual_estimate = 20
N_success = 8

So our y-value we would be learning would be 20*8 = 160 = N*failure_rate*individual_estimate

In the case where I try to use this values to simulate an estimate I get an error that the p value for the Binomial is not between 0 and 1. Sorry if I may not be understanding.

I am also trying to think if I could use survival analysis to estimate the failure_rate.

Thanks for you time.

Hi Archie,
What if you use a logistic link in the Binomial? Binomial(N, logistic(logit_failure_rate + logit_individual_estimate)) – that way, the probability of success of the Binomial will be between 0 and 1