I am currently engaged in a time-series prediction project using a dataset featuring a considerably zero-inflated response, comprising more than 30% zeros in the dependent variable. I would appreciate it if someone could provide some examples using pymc to address this issue. I hypothesize that it might be possible to use a combination of Bernoulli and Normal distributions in a mixture, or some similar approach, as the negative binomial distribution is not a feasible option for non-integer responses.

PyMC offers both Zero-Inflated Poissonand Zero-Inflated Negative Binomial distributions. But if you need a truly continuous variable, then you can take a look at examples of mixtures such as this for some ideas. Or if you have something started, you can post and see if others can get you headed in the right direction.

You can pass the zeros into a Binomial likelihood and the non zeros into a separate Normal likelihood without loss of information. There is no ambiguity as to which â€ścomponentâ€ť generated each type of data so you donâ€™t really need a Mixture.

1 Like