Sampling MaxEnt Distribution

adamh · August 3, 2017, 9:29am

Hi,

I’m trying to use PyMC3 to sample from a maximum entropy distribution over binary patterns of length N, constrained by means and pairwise correlations, which is given by:

Untitled

Such that the alphas and betas are known, and X is a binary multivariate variable. I’ve tried to do this using DiscreteUniform “baseline” and adding pm.potential but I got weird results (e.g., setting all alphas and betas to zero and not reproducing the DiscreteUniform behavior).

I’d appreciate any ideas regarding how (if?) can this be done.

Thanks!

junpenglao · August 4, 2017, 7:25pm

Hi @adamh, do you have some code demonstrating how far you get?

adamh · September 24, 2017, 12:17pm

Very sorry for the tardy response! Still stuck on this, though, so help is still welcomed.

I’ve started with the simplest example, in which the beta coefficients are zero and the alphas are not (simplest in the sense that the joint distribution factorizes into independent univariate distributions). I began with synthesizing data:

data = pm.Model()
alphas = np.array([-1,-2,0.5,3.5])
with data:
    X = pm.DiscreteUniform('X', lower=0, upper=1, shape=4)
    constraints = pm.math.dot(alphas,X)
    potential = pm.Potential("potential", constraints.sum())
    datatrace = pm.sample(4000, step=BinaryMetropolis(vars=[X]))

Then, I tried to learn these parameters from the data:

model = pm.Model()

with model:
    Xdata = pm.DiscreteUniform('Xdata', lower=0, upper=1, shape=4, observed=datatrace['X'].T)
    coeffs = pm.Normal('coeffs', mu=0, sd=1, shape=4)
    newconstraints = pm.math.dot(coeffs,Xdata)
    potential2 = pm.Potential("potential2", newconstraints.sum())    
    tr = pm.sample(4000, step=NUTS(vars=[coeffs]))

But I get very weird results…

I’d appreciate help in understanding:

What am I doing wrong at the moment in inferring the alphas.
How to include the beta coefficients in the model (both for simulating data and inference purposes).

hvasbath · September 25, 2017, 9:50am

Starting to show some code defenitely helps, but only saying that the results are weird doesnt help much in getting to know your problem and what the cause of your error might be. Please describe what you expect and show a result trace of your model that is going wrong.
Cheers!

adamh · September 25, 2017, 10:09am

Here’s a summary of the trace:

coeffs:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  1534.993         1.009            0.014            [1533.039, 1536.916]
  1686.984         1.008            0.012            [1684.999, 1688.966]
  1977.998         1.009            0.012            [1976.060, 1980.044]
  2277.013         1.007            0.012            [2275.048, 2279.039]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  1533.050       1534.283       1535.004       1535.680       1536.962
  1685.000       1686.298       1686.986       1687.649       1688.973
  1975.995       1977.329       1978.010       1978.661       1980.007
  2274.992       2276.330       2277.014       2277.696       2279.002

None

Clearly, the sampler had converged to a stable solution, but not to the alpha vector alphas = np.array([-1,-2,0.5,3.5]) from the data generating part…

junpenglao · September 25, 2017, 10:31am

You need to have constrained of your alpha and beta, or rewrite the energy function. Otherwise, it will just blow up towards negative infinite (you can sample using Metropolis and see the trace is just kept going negative). As it is right now, the potential function gets smaller and smaller when alpha is going negative.

There is a similar discussion here you can have a look PyMC3 for physics

adamh · September 25, 2017, 10:39am

Regarding the constraining - I thought that putting the prior coeffs = pm.Normal('coeffs', mu=0, sd=1, shape=4) should be enough (but it obviously isn’t). I also tried using a uniform prior that includes the relevant alpha values (like in the PyMC3 for physics example you provided), but this failed, as well - all values converged to the boundaries of the prior…

junpenglao · September 25, 2017, 11:40am

Maybe parameterized alpha as a unitary vector (i.e., vector with length=1)? I am not sure what is the usual treatment but I think I saw ppl doing that.

adamh · September 25, 2017, 11:41am

I’m not sure I understand - can you supply a code snippet?

junpenglao · September 25, 2017, 11:58am

Something like this:

import pymc3 as pm
import numpy as np
import theano.tensor as tt

alphas = np.array([-1,-2,0.5,3.5])
alphas = alphas/np.sqrt(np.power(alphas,2).sum())
with pm.Model() as data:
    X = pm.DiscreteUniform('X', lower=0, upper=1, shape=4)
    constraints = pm.math.dot(alphas,X)
    potential = pm.Potential("potential", -constraints.sum())
    datatrace = pm.sample(4000, step=pm.BinaryMetropolis(vars=[X]))

with pm.Model() as model:
    Xdata = datatrace['X'].T
    a1 = pm.Normal('coef', mu=0, sd=1, shape=4, testval=np.ones(4))
    coeffs = pm.Deterministic('alpha', a1/tt.sqrt(tt.power(a1,2).sum()))
    newconstraints = pm.math.dot(coeffs,Xdata)
    potential2 = pm.Potential("potential2", -newconstraints.sum())
    tr = pm.sample(4000, njobs=2)

pm.traceplot(tr)

I cannot recover the alpha yet but at least the trace makes sense now.

adamh · September 25, 2017, 12:13pm

I tried this, and indeed I get sensible alpha traces now (thanks!), but without recovering the original values.

Still, this seems like a relatively simple learning task (just 4 parameters, and as much data to learn from as I want…), so I guess there’s something I’m missing here… perhaps the potential should be written using theano variables and not pm.math functions?

junpenglao · September 25, 2017, 12:44pm

I dont think that’s the reason. You should try changing the expression of the potential function.

adamh · September 25, 2017, 1:00pm

That makes sense; I’ve tried, but did not succeed. If anyone has any suggestion regarding how to rewrite the potential function in a way that describes the function P(X) above, it’d be much appreciated!

hosoya · September 27, 2017, 8:34am

Hi there!

I think sampling a data vector from the network above directly is quite difficult because of the partition function. One approach to circumvent that could be sampling data sequentially from the conditional distributions of the nodes given the states of the rest of the network.

The idea is rooted in an article by Strauss (1994), who described the pseudolikelihood method to estimate the parameters of lattice models (see section 3.2 in that article). It basically boils down to a form of multivariate logistic regression that possibly could be implemented using pymc3 quite easily. The main trick for estimation is setting up the proper design matrix in the logistic regression model.

However, I am also interested if it would be possible to use pymc3 to estimate the parameters of the model above directly, as I heard that the normalization (z) would not be so much of a concern when using MCMC.

Literature:
Strauss, D. (1994). The many faces of logistic regression. American Statistician, 46(4), 321–371

adamh · September 27, 2017, 9:04am

This is exactly what I thought. Evaluating the partition function is a very hard problem (from what I understand, usually different approximations are computed instead, using the Wang-Landau method, for example); however, since it’s just a constant, I expected it not to have any effect on the sampling… so I still don’t understand why the inference doesn’t work. :-\

aseyboldt · September 27, 2017, 9:25am

I know next to nothing about the context here, so maybe this is completely off, but could it be that the problem isn’t acually the second inference part in NUTS, but the data generation using BinaryMetropolis? If that hasn’t converged, then the posterior given the data from the trace might be nowhere near the right values.

There are only 16 possible values in the data generation part, so maybe it would be better to compute all those probabilities, normalize and draw random samples directly.

Edit: Nope, that’s not it.

The problems seems to be that the normalization factor z depends on the values for alpha, so you can’t actually ignore it during sampling.

adamh · September 28, 2017, 12:15pm

Same here - tried it, didn’t work.

Is there any way around this? Perhaps explicitly representing Z in the model? It should be a rather simple sampling problem, shouldn’t it?

hosoya · September 28, 2017, 1:31pm

Explicitly representing the partition function $Z(\alpha, \beta)$ in the model could become complicated because the sum runs over all possible network states. For instance, with just 100 nodes we would have already $2^100$ possible network configurations. Not to speak of the 4950 possible connections between the nodes if we add the weights $\beta_{ij}$.

However, as the evaluated function $Z$ is basically scalar-valued, one could possibly write the independence model you suggested above as:
$$
p(x_1,\ldots, x_n) \propto \mbox{exp} \left{ - \sum_i^n \alpha_i x_i \right}.
$$
An alternative formulation could be:
$$
p(x_1,\ldots, x_n) \propto \prod_i^n \mbox{exp} \left{ - (\alpha_i x_i) \right}
$$
But I am not sure if this helps implementing the model using pymc3 using the potential class, as I am just starting out to learn the package. What practically works for me with reasonable speed – for not too large networks – is the pseudolikelihood procedure using the conditional distributions.

adamh · September 28, 2017, 1:57pm

Can you provide a code snippet/ref? Maybe I can try this, instead; I’m looking for a solution that’ll work, not specifically for a potential-based one.

hosoya · September 28, 2017, 2:24pm

In the R world, there is at least one package on CRAN that can do this: IsingFit by van Borkulo et al. (2014). Reading the respective paper is recommended.

I also have an experimental R package (ludwig) on github using a related but slightly different approach.

The source for both packages is open, so just have a look.

Reference:
Borkulo, C. D. van, Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., & Waldorp, L. J. (2014). A new method for constructing networks from binary data. Scientific Reports, 4(5918), 1–10. doi:10.1038/srep05918

Topic		Replies	Views
Marginalizing out a categorical variable Questions	18	2219	April 1, 2021
Custom Categorical Distribution - ensure bounded candidates v5 modeling	13	398	March 31, 2024
Can pymc3 get samples from a (not posterior) complex distribution using MCMC? Questions	14	1539	July 2, 2019
Indexing using a free random variable Questions	26	6421	October 16, 2019
Marginal likelihood for distributions with discrete variables v3	12	1467	November 14, 2022

Sampling MaxEnt Distribution

Related topics