Defining an unpooled mixture model

buckeye17 · July 22, 2024, 10:00pm

I want to define an unpooled model consisting of a mixture of 5 Gaussians. I’m having trouble defining the shape of the model components. Here’s my code:

categories = df.unit.unique() # there are 24 categories
idx = pd.Categorical(df.unit, categories=categories).codes # there are 21866 rows of data
coords = {"units": categories, "units_flat": df.unit.values}

with pm.Model(coords=coords) as model:
    p = pm.Dirichlet("p", a=5*[1])
    mu = pm.Normal("mu", mu=[1, 24, 48, 72, 96], sigma=5*[2], shape=5, dims="unit")
    sigma = pm.HalfNormal("sigma", sigma=5, shape=5, dims="units")
    components = pm.TruncatedNormal.dist(mu=mu[idx], sigma=sigma[idx], lower=len(idx)*[1], upper=None)
    
    Y = pm.Mixture("Y", w=p, comp_dists=components, observed=df.hours, dims="unit_flat")

The error message I get states, ValueError: Alloc static input type and target shape are incompatible: Vector(float64, shape=(5,)) vs (21886,)

ricardoV94 · July 23, 2024, 4:40am

Why are you indexing into the parsmeters of the components? That looks like you know what component each observation belongs to, which wouldn’t be a mixture scenario.

That way you’re defining 21k components a priori not the five you wanted.

buckeye17 · July 24, 2024, 4:25am

I’ll admit that I don’t fully understand the syntax which defines the unpooled groups. After further examination I believe I found my error. The likelihood distribution should not provide a dims argument. In other words, the likelihood distribution should be Y = pm.Mixture("Y", w=p, comp_dists=components, observed=df.hours).

However, I’ve realized this model isn’t quite what I wish it to be. This is because the Dirichlet distribution is still fully pooled, whereas I would like it to be unpooled. How can I make it unpooled?

ricardoV94 · July 24, 2024, 1:52pm

Can you explain a bit more about the whole model you have in mind. The mixture part still seems strange. Perhaps you have some code that generates data according to your model?

buckeye17 · July 24, 2024, 4:27pm

Sure. The data I am fitting represents the duration spent completing various repairs. A histogram suggests that the data is a mixture of Gaussians. It is multi-modal, where the largest peak occurs at 1 hour, the next largest peak at 24, then 48, 72 & 96. The data is also grouped into 24 groups, corresponding to different facilities that are carrying out this work. I am attempting to fit an unpooled model to this data as a stepping-stone to a partially pooled model. Does that help?

ricardoV94 · July 24, 2024, 4:52pm

Do you have no predictors you can feed into the model? A model that’s Normal on b0 + x*b1 can look marginally multimodal (say if x comes in roughly 5 levels) the you wouldn’t want to fit it as a Mixture model but a GLM.

If indeed you want a mixture, do you want to have 5 means learned from the data for each group or pooled? If you don’t pool the means it doesn’t make sense to pool the weights (unless there’s more details you are aware of).

The Mixture API doesn’t allow easily to make heterogeneous observations (unless you have the same number of observations per site). So you would need to define different Mixture for each site, each with 5 components.

buckeye17 · July 24, 2024, 6:00pm

So far I have resisted incorporating predictors into this model in the spirit of keeping it simple. My plan was to try adding predictors after I had done my best to model the observed Y density.

My plan was to fit an unpooled mixture model to most groups that have the most data, using the resulting group means and standard deviations as a starting point for the hyper priors of the hierarchical model. In both of these models I believe it would be imprudent to fix the ratio of the 5 Gaussians across all groups. Instead I think each group should have its own ratios of Gaussians.

ricardoV94 · July 24, 2024, 6:05pm

Then the easiest is to have a Mixture for each group instead of trying to combine all in one

Topic		Replies	Views
Unpooled gauss mixture regression model v5 modeling	1	77	June 19, 2024
Unpooled Dirichlet Process	1	50	September 18, 2024
Unpooled Accelerated Failure Time Model version agnostic	23	75	November 25, 2024
How can we build a mixture of mixtures? Questions	28	4247	February 17, 2021
Help with dimensionality of observation data in Mixtures v5 modeling	12	152	July 8, 2024

Defining an unpooled mixture model

Related topics