Find_MAP when optimizing a Potential

Hi Team, I am struggling to get a meaningful value for a MAP when trying to maximize a Potential on a Deterministic sum of Bernoulli samples.

More precisely I have the following model:

import pymc3 as pm

N = 100 
observed_sum = N * 0.5

with pm.Model() as model:
    # Define the prior for probability p
    p = pm.Uniform('p', lower=0, upper=1)

    # Generate the Bernoulli samples
    samples = pm.Bernoulli('samples', p=p, shape=N)

    # Calculate the sum of the samples
    samples_sum = pm.Deterministic('samples_sum', pm.math.sum(samples))

    # Define the potential to penalize the difference from the observed sum
    potential = pm.Potential('potential', -(samples_sum-observed_sum) ** 2)

    # Perform MAP estimation
    res_MAP = pm.find_MAP()

This is trying to find the probability p such that the sum of N Bernoulli with probability p is closest to N/2.

I would expect res_MAP[‘p’] to be around 0.5, which should maximise the potential, however it is very close to 0.

Can someone see anything obviously wrong in my script?

thanks for the help

1 Like

Welcome!

There’s something going on with the sum calculated over samples, though I am not 100% what. Sampling works:

idata = pm.sample()
idata.posterior["p"].mean()

yields:

<xarray.DataArray 'p' ()>
array(0.5000402)

And this:

    # Generate the Bernoulli samples
    #samples = pm.Bernoulli('samples', p=p, shape=N)
    samples = pm.Binomial('samples', p=p, n=N)

yields:

{'p_interval__': array(0.),
 'samples': array(50),
 'p': array(0.5),
 'samples_sum': array(50)}

I think this might just be too hard for the naive way find_MAP tries to optimize discrete models without knowing about constraints. With 100 bernoullis the probability that it will ever propose a valid state is very very small. Every one of the 100 samples must be either a 0 or 1, but the optimizer doesn’t know this.

As @cluhmann hinted, the Binomial does much better because now the optimizer only has to consider one direction of motion, and everything in the range (0, 100) is a valid state.

Thanks for the answers, yes i guess the map is doing multi stage optimization. Optically it should be a fairly simple problem, that should be convex in the only free variable p, so i was hoping find_map would just be iterating over various p values every time evaluating the potential on a sample.

The more generic problem is the following: how can we solve a bayesian network parameter inference problem (structure is known), where you can draw N samples but you can only observe a deterministic function of the samples?
In my case the network is just one Bernouilli and the deterministic function is the sum of outcomes.
I have seen variants of this problem asked on the forum and i understand sampling cannot really work for non trivial deterministic functions, but i was wondering if pymc3 has other (non sampling) optimization tools that might help

I think you will want to write your own optimizer that understands the discrete variable constraints (maybe doing enumeration like you said). find_MAP had a very specific goal in the context of PyMC, as an old way to initialize the NUTS sampler.

Optimization is not the real goal of PyMC so I don’t think there is anything off the shelve you can use instead. PyMC is very happy to provide you the model logp/gradients for your own optimizer though! Hope that helps

Yes makes sense, thanks!

Actually when you say that pymc can give me the gradient, are you saying that in my example by sampling pymc could give me the gradient of the potential with respect to p at a certain value of p?

I mean you can get the gradient of the model joint logp wrt to any continuous parameters. Is that what you meant?

Using PyMC V5 (V3 is pretty old already, you should be able to get the same things but the API has changed)

import pymc as pm
import numpy as np

N = 100 
observed_sum = N * 0.5

with pm.Model() as model:
    # Define the prior for probability p
    p = pm.Uniform('p', lower=0, upper=1, transform=None)

    # Generate the Bernoulli samples
    samples = pm.Bernoulli('samples', p=p, shape=N)

    # Calculate the sum of the samples
    samples_sum = pm.Deterministic('samples_sum', pm.math.sum(samples))

    # Define the potential to penalize the difference from the observed sum
    potential = pm.Potential('potential', -(samples_sum-observed_sum) ** 2)

p_dlogp = model.compile_dlogp(vars=[p])
p_dlogp({"p": 0.6, "samples": np.repeat([0, 1], 50)})  # array([-41.66666667])

I disabled the automatic transform of p with transform=None. Otherwise you would have p_interval__ as an input instead (which is fine and what you probably want in an optimizer, but less clear example).

Thank you that works, however what is the “samples” argument in the compiled dlogp?
In general samples is not observable in my case.
I was thinking that model.compile_dlogp(vars=[p]) would give the gradient of the posterior distribution with respect to a variable p, under the assumption that the prior is sampled from the uniform distribution.
Is it instead the posterior distribution assuming an observation for samples?

Dlogp gives you the gradient of the p parameter with respect to the model joint probability conditioned on all the unobserved parameters and observed data. Samples is unobserved but a value must be provided by the user (or algorithm).

That’s the whole point no? find_map or nuts get to the posterior by iteratively proposing different values of samples and p. For p you can get gradient information (conditioned on data and all other paramerers) to guide your search more cleverly. For samples you cannot because it’s a discrete parameter.