How to generate data as a variable?


I have developed a time series sales forecasting model that works better than expected. I’d like to add a variable simulating how many buyers we will have as our buyer count is predictive of our sales. I know, how many buyers are possible in each market because our buyers have to register with us in order to buy our product.

That said, is it as simple as the following:

with pm.Model(coords=coords) as constant_model:
    simulated_buyers = pm.TruncatedNormal('simulated_buyers', mu = "registered_buyers.mean", std = "registered_buyers.std()", upper = "registered_buyers.max")
    buyers_coeff = pm.Normal("buyers_coeff", mu = 0, std =1)

   mu= simulated_buyers*buyers_coeff
   sigma = pm.HalfNormal('sigma', sigma=100)

   eaches = pm.StudentT('predicted_eaches',
                             # lower = 0,

Where “registered_buyers.mean/std” are the mean and standard deviation of our registered buyers on a monthly basis?

Is there anything wrong with taking two RVs and multiplying them together?

You can definitely multiply things together, no problem. I guess it’s common in mixed media models, see here for an example where random variables are mixed together in a regression. As it stands, what you wrote is just fine, you will just have to make sure all the shapes work, as simulated_buyers will inherit the shape of registered_buyers.mean.

Another option would be to make simulated_buyers an observed node, with the number of buyers each month as data and estimate mu and sigma. As it stands you lose some uncertainty because you are computing summary statistics (registered.mean and .std) outside the model, then using them as deterministic data. It might not matter in your application, though.

1 Like