Fitting a binomial mixture from data of differing sample sizes

marty0801 · May 26, 2018, 2:22pm

I would like to do Bayesian sampling from the posterior distribution of a binomial mixture model, conditioned on observed data which are samples of the mixture with different sample sizes. Thus, the data are not identically distributed, but depend upon the same parameters. Is PYMC equipped to do this kind of thing?

In math: the parametric model comprises class probabilities \pi=(\pi_1,...\pi_L) and binomial success parameters \psi=(\psi_1,...\psi_L). A draw from the model given sample size N consists of sampling f(X|N,\psi,\pi) := \sum_{l=1}^L \pi_l f(X|N,\psi_l) where f(X|N,\psi_l) is the standard binomial PMF.

When all samples X_k have the same N, there is no problem. My question is how to sample the posterior distribution (with the normal priors–Dirichlet on \pi, uniform on \psi) when sample sizes N are different between draws.

junpenglao · May 26, 2018, 2:33pm

The N in the pm.Binomial can be a array or a tensor - so I dont think it is much of a problem in principle:

    ...
    dist1 = pm.Binomial.dist(n=N1, p=psi1)
    dist2 = pm.Binomial.dist(n=N2, p=psi2)
    mixmod = pm.Mixture('mixmodel', w=w, comp_dists=[dist1, dist2], observed=data)

marty0801 · May 26, 2018, 2:46pm

It doesn’t seem to work. When N is a vector comparable to the data vector, it breaks; when N is constant, it works:

import pymc3
import numpy as np


n_samps = 10
N = np.random.randint(50,100,n_samps)# breaks
N = 100 # works
P = np.random.rand(n_samps)
data = np.random.binomial(N,P)

n_comps = 3

with pymc3.Model() as model:
    w = pymc3.Dirichlet('w', a=np.ones(n_comps))
    psi0 = pymc3.Uniform('psi0',lower=0,upper=1,shape=n_comps)
    dists_ar = pymc3.Binomial.dist(n=N, p=psi0, shape=n_comps)
    mixmod = pymc3.Mixture('mixmodel',w=w,comp_dists=dists_ar,observed=data)

I would think the use-case I’ve described would be a fairly unconventional application of Bayesian sampling, since the data are NOT samples of the same distribution. It would seem surprising to me if pymc was equipped to handle this by default.

marty0801 · May 26, 2018, 2:56pm

@junpenglao I don’t quite understand your example; could you elaborate? Do N1 and N2 represent fixed tensors the same size as data?

junpenglao · May 26, 2018, 3:10pm

Be more careful of the data broadcasting and it should work:

with pymc3.Model() as model:
    w = pymc3.Dirichlet('w', a=np.ones(n_comps))
    psi0 = pymc3.Uniform('psi0',lower=0,upper=1,shape=n_comps)
    dists_ar = pymc3.Binomial.dist(n=N[:, None], p=psi0, shape=(n_samps, n_comps))
    mixmod = pymc3.Mixture('mixmodel', w=w, comp_dists=dists_ar, observed=data)

Figuring out the shape in mixture comp_dist is not always straightforward, a useful way is to first try contructing the mixture components as a Random variable explicitly:

with pymc3.Model() as model:
    w = pymc3.Dirichlet('w', a=np.ones(n_comps))
    psi0 = pymc3.Uniform('psi0',lower=0,upper=1,shape=n_comps)
    dists_ar = pymc3.Binomial('comp', n=N[:, None], p=psi0, shape=(n_samps, n_comps))
    mixmod = pymc3.Mixture('mixmodel', w=w, comp_dists=dists_ar.distribution, observed=data)

doing so you can check the shape of dists_ar by doing dists_ar.tag.test_value.shape to make sure it is the right one.

AidanMar · January 18, 2022, 8:37pm

@junpenglao I would like to do something very similar to this model. I have many binomial samples, each with different numbers of trials as stated here. I would like to model the data as a mixture of two Binomials, like here, but instead of psi0 = pymc3.Uniform(‘psi0’,lower=0,upper=1,shape=n_comps) as done here with shape 2 for the 2 priors in psi0, I would like to use

psi0 = pm.Uniform(‘psi0’,lower=0,upper=1)
psi1 = pm.Constant(‘psi1’, 0.3)
(kind of like 0.3 inflated binomial).
When I try to do this with various implementations I get errors regarding the shapes of the distributions and broadcasing. I have spent a long time trying to wrap my head around the issue but am still struggling to make progress. 1) A template of how to do this would be amazing, 2) pointers to high-quality resources on how to make sense of the shape/broadcasting in pymc3 would extra amazing!

Best wishes!

Topic		Replies	Views
SamplingError In Binomial Model. Practice Problem 1M21 From BMCP Book	2	301	May 10, 2023
Help with dimensionality of observation data in Mixtures v5 modeling	12	141	July 8, 2024
Can I avoid sampling irrelevant bits of the distribution? Questions	1	469	February 2, 2020
Mixture Model Metropolis vs. NUTS Questions	1	588	April 1, 2020
Bernoulli samples all the same? Questions	1	2438	May 29, 2018

Fitting a binomial mixture from data of differing sample sizes

Related topics