In general, latent discrete variable is better model as continue mixture (i.e., marginalized out the discrete variable), that case you can get rid of the mask and indexing. Some pointer to converting a discrete latent into mixture could be find in: Frequently Asked Questions - #15 by junpenglao
Not out of the box. Note that PyMC3 treat masked numpy array in the observed by indexing into the none-mask value to compute the log_prob, and index to the masked value to create a new Random Variable. You can in principle do the same by indexing and do 2 subtraction
Deterministic does not have log density you can evaluate.