Understanding implementation of initial values in the usage with GMMs

ms_ger · September 6, 2024, 9:09am

Hi,

I am currently studying the usage of GMMs in pymc.

I wanted to get the log likelihood of a GMM model and had problems to retrieve it by how I implemented it.

Lets look into this very simple example, and this will work:


### some data:
rng = np.random.default_rng(123)
N = 1000
W = np.array([0.2, 0.5, 0.3])
MU = np.array([-1, 0, 1])
SIGMA = np.array([0.5, 1, 0.5])
component = rng.choice(MU.size, size=N, p=W)
x = rng.normal(MU[component], SIGMA[component], size=N)

### the model that works
K = 3
with pm.Model() as model:
    w = pm.Dirichlet("w", a=np.ones(K))
    mu = pm.Normal("mu", mu=np.linspace(x.min(), x.max(), K), sigma=10, shape=K, transform=pm.distributions.transforms.univariate_ordered)
    sigma = pm.HalfNormal("sigma", sigma=1,)
    y = pm.NormalMixture("y", w=w, mu=mu, sigma=sigma, observed=x)
    trace = pm.sample(idata_kwargs={"log_likelihood": True})
    pm.sample_posterior_predictive(trace, extend_inferencedata=True)

if I change mu to be:

mu = pm.Normal("mu", mu=0, sigma=10, shape=K, transform=pm.distributions.transforms.univariate_ordered, initval=np.linspace(x.min(), x.max(), K))

I will get the

NotImplementedError: Cannot convert models with non-default initial_values

What are the implications of the different use cases. Shouldn’t it be the same?

ricardoV94 · September 6, 2024, 1:28pm

When do you get the error? Is it when computing log_likelihood?

ms_ger · September 6, 2024, 2:39pm

Exactly

ricardoV94 · September 6, 2024, 3:20pm

In the meantime you can get around by not setting initval in the distribution, and instead define in initvals when calling pm.sample

jessegrabowski · September 6, 2024, 3:34pm

I believe univariate_ordered is deprecated as well. You should use ordered (and upgrade to the latest version if you’re not getting a nagging message about it)

ms_ger · September 6, 2024, 3:48pm

Thank you for the help.

I checked and adding initvals to sample solves the issue, using ordered is not working and the sampling seems not to converge to the same values for the different chains.

I am using pymc version 5.16.1

Thats the image for ordered

Thats the image for univariate_ordered

jessegrabowski · September 6, 2024, 3:51pm

You are experiencing run-to-run variation. Here is the code for univariate_ordered:

    if name in ("univariate_ordered", "multivariate_ordered"):
        warnings.warn(f"{name} has been deprecated, use ordered instead.", FutureWarning)
        return ordered

Topic		Replies	Views
Initial values not being used when sampling v5	7	1998	July 15, 2022
`model.initial_point()` returns inf if using `pm.distributions.transforms.Ordered()` v5 bug	2	20	August 6, 2024
Logp questions, synthetic dataset to evaluate modeling v5 modeling	10	449	May 18, 2023
Unexpected Initial evaluation results v5 modeling	6	784	June 30, 2022
Mu of pm.Normal() depends on output of RandomVariable on previous time step. How do I implement this? v5 time_series , modeling	8	717	July 26, 2023

Understanding implementation of initial values in the usage with GMMs

Related topics