Mixture of continuous and discrete logp

Causal inference is most commonly capture by the model graph structure (at least in Bayesian DAG that PyMC3 could model), so even if you model the discrete and continuous observed separately you can still model the causal relationship - for example you can use parameters that shared between the discrete and continuous variable.

I think it would becomes more clear when you have a concrete set up and data - feel free to come back and update this when you start building your model.