In hindsight, it makes sense that the intercept would not be determined, since in this case it does not matter (one of the two options is always chosen, and the difference in probability is unaffected when adding the same constant to both).
My problem with not being able to replicate the results with the Categorical model (model 2) remains the same though 