Thanks for your suggestion and kind help. The discrete variables and continuous variables rarely share parameters. So, the causal relation is more implicit/explicit among discrete variable and continuous variable themselves. For example, a discrete variable may be blended into a linear function as the mean of a normal distribution of a continuous. In the case I have, there are the categorical diagnosis, the prescription doses (continuous), and the assessment (categorical classes) for each record of the predictors (e.g. diagnosis and prescription). This is how I come up p(y=c|x_discrete, x_continuous, theta_discrete, theta_continuous) as shown before. I could alternatively model prescription depends on diagnosis that may make class labels independent of diagnosis given prescription but I do have cases in the project that the class labels depend on both continuous and discrete. This is an estimation for classification problem so far. I actually also want to turn it to a clustering problem by ignoring labels. I would like to see whether the unlabeled data clustering and classification data group overlap reasonably well in distribution. Hopefully, it provides a convincing set up for why I have discrete and continuous in the same model.
Regards
Chris