Hello PyMC Community,
I am analyzing a dataset for Regression Discontinuity Design where the dependent variable, y, represents the fraction of students at each unique score who rated their experience of taking an exam remotely above 4, i.e., above ‘somewhat satisfactory’. The independent variable, x, corresponds to these unique scores. However, my data includes exact 0s and 1s, for instances where all students at a performance level either rated below or entirely above 4. I have 2 questions:
-
Given that y can be exactly 0 and 1 as well as fractional values in between, it is suitable to consider this as a straightforward probability distribution? If so, are there specific transformations you would recommend for y so I can model this using a Beta regression model? Currently, my beta regression model does not converge due to presence of these boundary values.
-
If you think it is a straightforward probability distribution, what regression model would best suit this data, especially given the presence of 0 and 1 values? Perhaps a Zero-and-One inflated Beta distribution? If so, is there an existing PyMC tutorial on such model that might help?
I appreciate any insights or suggestions.
I have attached a simulated data that illustrates the issue below. The independent variable i.e. test_performance values have been centered at 0 based on an observed discontinuity at 4. The threshold variable represent treatment assignment. Fractions is the dependent variable
simulated.csv (1.2 KB)