New to PyMC3 (used PyMC2 for ages, then Stan for a long time, now curious to try PyMC3), and wondering what the “best” way to implement a common issue I see is.
I’d like to use a Bayesian Network to synthesize different data sources that all measure things a little differently. One tricky aspect is that the marginal probabilities I get from those sources all use slightly different methods for discretizing the independent variables. For instance, one gives values by wealth quartiles while the other uses quintiles; or they give marginals by age group, but use different (non-overlapping) bins in reporting.
So for a simplified example say I have information on
p(A | X') and
p(B | X*), where
X* are different discretized transformations of the underlying continuous variable
X. And I want to then e.g. estimate the joint distribution of
I could imagine a few ways that might work to model that:
Xas a continuous random variable in the model and then create
X*using discretizing transformations
- Include both
X*as random variables by using a multivariate distribution that attempts to incorporate the relationships between them as e.g. a correlation matrix
- Instead start by simulating individuals from the population directly using the marginal probabilities
My intuition is that 1 is the most correct way, but that I would run into issues with performance due to the discretization (or at least that was my experience with Stan in the past). Any advice (and suggestions/examples of how to implement it) would be greatly appreciated!