Can you marginalize a mixture model where the draws from the different components are not independent?

That’s correct. Your Potential trick only works with the explicit sampled variables. If you were marginalizing the indicator variables you would need to compute the posterior probability of each indicator variable to apply the Potential penalty term which is not straightforward from the way PyMC models are built as a DAG of conditional dependencies.