Using CustomDist for sample_prior_predictive

In general, a good place to start is to generate synthetic data that faithfully reflects the model you are using for inference. Generating a (relatively) small amount of data allows you to iterate quickly. Checking posteriors against (known) true parameter values allows you to figure out if the model is what you want it to be. Jumping straight to big/real data sets can be very tricky, even for those who have been doing this sort of thing for a while.

If you haven’t already, I would suggest taking a look at this notebook which, though not updated to PyMC v5, may prove useful.