If you model the entire data set you generated (i.e., if you don’t throw out the rows where simulated_data["sample_type"] == 0
, then you get bimodal data:
and the model fits ok for me:
but the posterior predictive check looks bad:
because (I assume), you are still mixing the p
s on a per observation basis for all observations where simulated_data["sample_type"] == 1
.
Here is the same observed data as plotted above, but plotted as proportions:
There is a peak at approximately 0.25. But the model can’t place a peak there, because the model is a mixture of 0.0001 and 0.5. The data is a mixture of zeros (when simulated_data["sample_type"] == 0
) and a mixture of 0.0001 and 0.5 (when simulated_data["sample_type"] == 1
).