Hello,

I have data that looks something like the table clip below. Here’s some context of the data generation story. For some period of time (sample_id) a measurement (total_sample_input) is feed into a machine who’s control nobs are left at specific values (covariate variables) and the machine sorts exactly all the input measurement into distinct categories (output_category variables which sum up to total_sample_input). It is believed that the output categories follows a Dirichlet distribution which is causaly shaped by the control nob values. The relationship of the covariates on the output categories is believed to be a piecewise function rather than strictly linear.

**How would you suggest modeling this situation? How could we infer/predict the probable distributions between output categories due to the covariate values?** As a beginner to Bayesian modeling and PyMC, I would greatly appreciate your help.

sample_id | total_sampe_input | output_category1 | output_category2 | output_category3 | covariate1 | covariate2 | covariate3 |
---|---|---|---|---|---|---|---|

0 | 34124.49 | 25490.77301 | 3568.280369 | 5065.436624 | 0.827175228 | 3.263871777 | 3.908952995 |

1 | 33855.42 | 10600.98815 | 1922.761763 | 21331.67009 | 7.192615377 | 0.348475462 | 0.458909161 |

2 | 30155.25 | 25552.07541 | 2265.240505 | 2337.934086 | 6.497646449 | 1.205781579 | 0.296571973 |

3 | 45996.91 | 26231.19332 | 16543.08086 | 3222.635825 | 7.276013079 | 0.605861635 | 0.118125286 |

4 | 33108.02 | 12336.13038 | 9199.140636 | 11572.74898 | 6.876460974 | 0.191660237 | 0.93187879 |

Initially, I tried to apply the methods in the “Dirichlet Mixtures of Multinomials” example, but struggled to modify it to my situation since my data is not technically count data (the total_sample_input can be divided into fractions among the output_category), and because certain output categories can corelate with each other at certain covariate values.

I then started to apply the methods discussed in the “Categorical Regression” example because I thought BART might well capture the piecewise impact of the covariates on the output categories, but again I struggled modifying the article’s example to my non-count data.

Your help is greatly appreciated; Thanks!