How to model a RV as the sum of other RV ? (without loop!)

Hi,

I’m trying to build model that fit the process of filling a cart with goods in a supermarket. I’m only interested in the price (not the actual chosen goods).

The cart filling process is as follows :

  1. a random variable (RV) is drawn to model the number of goods in the cart ( a discret RV, Poisson or Binomial, …), lets call it N (Number of goods)

  2. N RV are drawn from a continuous distribution to model the price of the N selected goods , let call this RV P (Price of each goods) (it’s dimension is N)

  3. lets S be the sum all components of P (the total value of the cart)

  4. my observable is a list of cart values , so I model this as a Normal which mu is S and a sd with a small prior to account for model error.

I’m having hard time to code this process in a Pymc model.

My first problem is at step 2 : I can’t use a RV as a shape parameter to draw the desired number of RV. And it is also discouraged to use loop to build RV, anyway I guess that RV can’t be used as loop boundary either…

My solution to this is to drawn big enough P vector, and then drawing N as a binary vector (which sum = N) and multipling with P. The 0s skip some value of P and the 1s select exactly N values, and then I sum the result.
It is mathematically correct : It sums N values drawn my continuous distribution…But it draws a lot of unused samples which so time consuming that I can’t really use that trick on real data (and I also wonder if it may misslead the convergence…(?))

Any help will be appreciated.

I would use similar solution as yours - maybe with a mixture model to improve convergence. But my question is do you really need to put RV P in your model? Likely it is really difficult to infer from your data as you only have a list of cart values as observed - would it be possible to turn P into a vector with known values (i.e., all the unique price tag)?