Different confidence estimates for models with(out) nuance parameters

maexlich · July 31, 2017, 11:39am

Dear PyMC3 community,

I have a question regarding a dirichlet-multinomial model I implemented.
I get very different standard deviations for my parameter estimates if I compare the model with nuance parameter with the model where the nuance parameters were integrated out. The mean estimates are the same nevertheless.

%matplotlib inline
import seaborn as sns
import pymc3 as pm
from dirichlet_multinomial import DirichletMultinomial
import numpy as np
from numpy import random as nr

# Data generation
alphas = np.array([5, 7, 1, 0.5])
ndraws = 2000
N = 40
dirichlets = np.array([nr.dirichlet(alphas) for i in range(N)])
data = np.array([nr.multinomial(ndraws, p) for p in dirichlets])

with pm.Model() as model:
    lowerbound = pm.Bound(pm.Flat, lower=0)
    a = lowerbound('alphas', shape=len(alphas))
    d = pm.Dirichlet('dirichlets', a, shape=(N, len(alphas)))
    obs = pm.Multinomial('data', n=ndraws,p=d, observed=data)

# model with nuance parameters
with model:
    trace = pm.sample()
    pm.traceplot(trace, ['alphas'])
    pm.summary(trace, ['alphas'])

Gives the following summary of the parameters alpha:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  6.062            0.851            0.038            [4.550, 7.803]
  8.400            1.203            0.051            [6.072, 10.692]
  1.032            0.160            0.006            [0.730, 1.339]
  0.595            0.096            0.004            [0.420, 0.775]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  4.562          5.488          6.024          6.621          7.843
  6.176          7.532          8.392          9.201          10.878
  0.755          0.922          1.028          1.138          1.366
  0.423          0.524          0.588          0.663          0.782

Whereas if I run the inference without nuance parameters:

with pm.Model() as implicit_model:
    lowerbound = pm.Bound(pm.Flat, lower=0)
    a = lowerbound('alphas', shape=len(alphas))
    obs = DirichletMultinomial('data', n=np.array([ndraws]*N, dtype=np.uint), a=a, observed=data)

with implicit_model:
    implicit_trace = pm.sample()
    pm.traceplot(implicit_trace, ['alphas'])
    pm.summary(implicit_trace, ['alphas'])

the results are much more confident (much lower standard deviation of the estimates):

Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  5.756            0.131            0.008            [5.520, 6.033]
  7.977            0.185            0.011            [7.625, 8.341]
  0.986            0.026            0.001            [0.937, 1.038]
  0.566            0.015            0.001            [0.539, 0.595]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  5.486          5.670          5.754          5.848          6.015
  7.610          7.850          7.982          8.105          8.334
  0.934          0.968          0.987          1.003          1.035
  0.539          0.556          0.567          0.578          0.595

Traceplot (unable to add more than one image)

Is this behaviour expected or do I have an error in my likelihood specification?
I defined the DirichletMultinomial log likelihood as follows:

class DirichletMultinomial(pm.Discrete):
    def __init__(self, n, a, *args, **kwargs):
        super(DirichletMultinomial, self).__init__(*args, **kwargs)

        self.K = tt.as_tensor_variable(a.shape[-1])
        self.n = tt.as_tensor_variable(n[:, np.newaxis])

        if a.ndim == 1:
            self.alphas = tt.as_tensor_variable(a[np.newaxis, :])  # alphas[1, #classes]
            self.A = tt.as_tensor_variable(a.sum(axis=-1))  # A is scalar
            self.mean = self.n * (self.alphas / self.A)
        else:
            self.alphas = tt.as_tensor_variable(a)  # alphas[1, #classes]
            self.A = tt.as_tensor_variable(a.sum(axis=-1, keepdims=True))  # A[#samples, 1]
            self.mean = self.n * (self.alphas / self.A)

        self.mode = tt.cast(pm.math.tround(self.mean), 'int32')

    def logp(self, value):
        k = self.K
        a = self.alphas
        res = bound(
            tt.sum(
                factln(self.n) + gammaln(self.A) - gammaln(self.A + self.n)
                + tt.sum(gammaln(self.alphas + value) - gammaln(self.alphas) - factln(value), axis=-1),
            axis=-1),
            tt.all(value >= 0),
            tt.all(tt.eq(tt.sum(value, axis=-1, keepdims=True), self.n)),
            tt.all(a > 0),
            k > 1,
            tt.all(tt.ge(self.n, 0), axis=-1),
            broadcast_conditions=False
        )
        return res

Here also:
JuPyter Notebook with example

I would really appreciate any help regarding this issue.

Best,
Max

junpenglao · August 1, 2017, 7:46am

Hi Max,
I edited your post so that the image can display directly.
As for the SD of the posterior, I would suggest checking the logp of the model - my guess is that the logp is normalized differently.
Also cc @AustinRochford for more expert advice

maexlich · August 1, 2017, 12:39pm

Hey Junpeng,

thank you for the advice.

How can I check the logp of the model if the variables are different?
A call of model.logp requires all the variables of the model to be defined if I understand it correctly. Would the logp outputs of both models be comparable nevertheless?

I figured that there actually was a factorial missing in my logp definition. Nevertheless, the SD of the inferred variables stayed exactly the same. If the logp of my model is normalized differently, how could I correct for it? Shouldn’t changes in the logp (for example just adding or subtracting a value) be reflected in the posterior estimates?

maexlich · August 9, 2017, 1:43pm

I figured it out. Something went weirdly wrong with theanos broadcasting, resulting in stronger deviations with increasing dimensionality. I made a new gist in case somebody is interested in using it to model count data with a Dirichlet-Multinomial

AustinRochford · August 11, 2017, 11:02pm

Ah, that’s often a source of subtle bugs. Glad you found it!

Topic		Replies	Views
Hierarchical multinomial model Questions	3	814	April 2, 2021
Dirichlet-Multinomial Model does not appear to be sampling at all Questions	8	2643	November 9, 2018
Confusion on the use of shape parameter in multinomial likelihood v3 modeling	4	388	July 21, 2023
Inferring Dirichlet concentration parameter Questions	11	1202	May 13, 2020
Posterior to Prior in Categorical distribution (or encapsulating multiple data sources to categorical analysis) Questions modeling	4	1771	February 28, 2022

Different confidence estimates for models with(out) nuance parameters

Related topics