I’ve been reading the excellent post about approximating the marginal likelihood for model selection from @junpenglao [Marginal_likelihood_in_PyMC3] (Motif of the Mind | Junpeng Lao, PhD) and learnt a lot. It will be highly appreciated if I can have a chance to discuss some follow-up questions in this forum.
The parameters in the given examples are all continuous. For me,I want to apply the strategies mentioned in the post like SMC and Bridge sampling to a mixture model. The mixture model includes a discrete variable called assignment indicating which group each subject belongs to, and a corresponding weights variable called p drawn from a Dirichlet distribution so that assignment ~ Categorical(p). I’m not sure how to modify the implementation so that these methods can also apply to the case where the discrete variable assignment and the constraint variable p (between 0 and 1) exist.
- Regarding the Bridge sampling, in the post, a multivariate normal distribution is used as the proposal distribution, and the corresponding code is:
# get mean & covariance matrix and generate samples from proposal m = np.mean(samples_4_fit, axis=1) V = np.cov(samples_4_fit) L = chol(V, lower=True) # Draw N2 samples from the proposal distribution gen_samples = m[:, None] + dot(L, st.norm.rvs(0, 1, size=samples_4_iter.shape)) # Evaluate proposal distribution for posterior & generated samples q12 = st.multivariate_normal.logpdf(samples_4_iter.T, m, V) q22 = st.multivariate_normal.logpdf(gen_samples.T, m, V) # Evaluate unnormalized posterior for posterior & generated samples q11 = np.asarray([logp(point) for point in samples_4_iter.T]) q21 = np.asarray([logp(point) for point in gen_samples.T])
For my mixture model case, I’m wondering how to modify the proposal becuase it may not appropriate to draw discrete variables from the multivariate normal distribution. But my model do have other continuous variables as well. If I directly implement the code, I will get:
LinAlgError: 31-th leading minor of the array is not positive definite for the
L = chol(V,lower= True)
command, because for each subject, the samples of the discrete variable called assignment converges to the same number, then some rows of the matrix V have identical elements.
- Regarding the SMC method, it seems that the
pm.sample_smcfunction has some problem sampling the discrete variable since I got
11 frames <__array_function__ internals> in broadcast_arrays(*args, **kwargs) /usr/local/lib/python3.7/dist-packages/numpy/lib/stride_tricks.py in _broadcast_shape(*args) 418 # use the old-iterator because np.nditer does not handle size 0 arrays 419 # consistently --> 420 b = np.broadcast(*args[:32]) 421 # unfortunately, it cannot handle 32 or more arguments directly 422 for pos in range(32, len(args), 31): ValueError: shape mismatch: objects cannot be broadcast to a single shape
I think my model statement is correct because it can be sucessfully implemeted using
pm.sample with CategoricalGibbsMetropolis for discrete variables and Metropolis for continuous variables.
Therefore, I’m wondering for my case with both discrete and continuous variables, as well as a weight variable which needs to be between [0,1], is it possible to use Bridge sampling or SMC to obtain the approximation of mariginal likelihood for model comparison? If not, is there any alternative menthod to estimate the marginal likelihood?
Thank you so much for your help!