Error in the sampling process, 'sd' not taken in account in Metropolis.py

Hello,

I am still trying to make PyMC3 work with external models, and after some tests where my chain was stationary, I decided to come back to an example I had done at the beginning to understand how PyMC3 works, the good old linear regression from the documentation, except with different sd parameters for the priors.

I had suspicion that the sampler was not trying prior values with an ‘sd’ value of 1000 as I had told him to, but one close to a seemingly default value of 1.

To actually see the values of the prior parameters that the Metropolis (the sampler interested in) was sampling, I resorted to define an @as_op Theano operator and call the linear function from an external file to be able to print out values (it actually helped me in parallel to see how to call an external model, along with issue #1925).
I used this example from Bountysource, and modified it as shown below.
I call the external model containing the linear function of the prior parameters in a dummy.py file, where I added a print() to see what was the value of the parameter alpha at each step.
By the way, if there is a way to do that in PyMC3 without using this convoluted way, I’m all ears.

Now, the problem:
If you run the code below, apart from the fact that it runs about 4000 times slower than the standard one, due to the read/write and external subprocess calls, you can see that the values of alpha tried by the sampler are close to +/-1, even though I specified for alpha, and sd parameter of 1000. This is not what one should expect if I understood correctly how the Metropolis algorithm works. It should get the tried values from the proposed prior distributions.
Now if you go to metropolis.py in the PyMC3 code, and change the scaling factor from 1 in def __init__(self, vars=None, S=None, proposal_dist=None, scaling=1., tune=True, tune_interval=100, model=None, mode=None, **kwargs): to 1000 like that def __init__(self, vars=None, S=None, proposal_dist=None, scaling=1000., tune=True, tune_interval=100, model=None, mode=None, **kwargs):
and rerun the code, it works, as you get values of alpha that are actually tried with values around +/-1000.

It is like the sampler does not care of what you input as sd parameter. Is this normal? Or is there something I completely overlooked?

Main code:

import numpy as np
import matplotlib.pyplot as plt
import theano.tensor as t
import theano
import json
import subprocess
import pymc3 as pm

#%% Simulate data
# Initialize random number generator
np.random.seed(123)

# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

# Size of dataset
size = 100

# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

# Simulate outcome variable
Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma

# Plot data
fig, axes = plt.subplots(1, 2, sharex=True, figsize=(10,4))
axes[0].scatter(X1, Y)
axes[1].scatter(X2, Y)
axes[0].set_ylabel('Y'); axes[0].set_xlabel('X1'); axes[1].set_xlabel('X2');

#necessary for JSON reading
X1=X1.tolist()
X2=X2.tolist()

#%% External function call
i = 0
@theano.compile.ops.as_op(itypes=[t.dscalar,t.dscalar,t.dscalar],otypes=[t.dvector])
def proc_test(alpha, beta1, beta2):
    #### write an input file
    global X1, X2, i
    print(alpha)

    params = {};
    params['X1']    = X1;
    params['X2']    = X2;

    params['alpha'] = float(alpha);
    params['beta1'] = float(beta1);
    params['beta2'] = float(beta2);

    ###
    with open("input.txt", 'w') as outfile:
        json.dump(params, outfile)

    #### make a system call
#    os.system("python dummy.py input.txt");
    subprocess.Popen("python dummy.py input.txt", shell=True).wait()

    # Get the number of function runs each 100 NOT necessary, just checking
    i +=1 
    if i%100 ==0:
        print(" number of evaluations {}".format(i))

    #### read the output file and return the value
    mu = np.loadtxt("output.txt");
    

    return(np.array(mu))

#%% Model definition
with pm.Model() as model: # model specifications in PyMC3 are wrapped in a with-statement

    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=0, sd=1000);
    beta1 = pm.Normal('beta1', mu=0, sd=10);
    beta2 = pm.Normal('beta2', mu=0, sd=10);
    sigma = pm.HalfNormal('sigma', sd=1);

    # Expected value of outcome
    mu = proc_test(alpha, beta1, beta2);

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)

    # Inference!
    #start   = find_MAP(); # Find starting value by optimization
    step = pm.Metropolis()
    trace = pm.sample(100,step)

#plot results
plt.figure(figsize=(12, 12))
pm.summary(trace)    
pm.traceplot(trace)

And the dummy.py file

import sys
import numpy as np
import json

def main(argv):

    # provide an input file from the command line 
    input_file   = sys.argv[1];

    # read in the input file    
    params = {};
    with open(input_file) as json_data:
        params = json.load(json_data)

    X1 = np.asarray(params["X1"], dtype=float);
    X2 = np.asarray(params["X2"], dtype=float);

    # the generative model
    mu = float(params["alpha"]) + float(params["beta1"])*X1 + float(params["beta2"])*X2
    # here the the print to check the value of alpha
    print(float(params["alpha"]))
    # write the results to a output file    
    np.savetxt("output.txt", mu);

    return

if __name__ == "__main__":
    main(sys.argv[1:])

Hi @viiv, I think there are some subtle misunderstanding you have about sampling that leads to this confusion. The way you think of sampling (i.e., if alpha ~ Normal(0, 100), then the sampler will generate some Normal(0, 100) candidate and do an update) is not how Markov Chain Monte Carlos samples. Say that you started with an initial value alpha = .1, if the typical set of alpha is around the value 1., it will just move towards 1 (.1 → .5 → 1.1 …) and then explored the region there. It will not explore value too far away from 1 even though the prior is vague (large sd).
The scaling in Metropolis is a different thing: it controls the proposal jump at each step. And if the proposed value is too large (e.g., .1 ->1000.) that leads to a low logp the proposed value will just get rejected.

Essentially, the things that you observed:

is an indication that your posterior (remember, the values of alpha you saw now is actually the posterior samples) is not sensitive to your prior.

For more info on MCMC you can check my blog post: http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/

Thank you for the reply, I know this is how Metropolis is supposed to work, but in my case for instance, when I try to sample from my complicated external model, I have fake data, where my mean is for instance at 6000, and I try with a mu=5500 and sd=1000 for a Normal distribution, and it constantly keeps trying around 5499, or 5501 for hundreds of samples…to always I guess reject the proposal since it comes back to 5500 at each ‘link’ of the chain (I see it from the output in a text format with db=…)
Is this normal? I would expect it to try to go at 5500+/-1000 values at least, to maybe realize that around 6000 it is good.

Also, as I was asking is there a way other than what I did to check ALL the steps taken by the Metropolis algorithm? (not only the one output ‘link’ of the chain)?
Because in my example, for each ‘link’ it does 3 ‘steps’ (= 3 function calls) where I see that it tries 5500.0, then 5501.xx…and then back to 5500.0. Each ‘link’ is made of 3 steps and it is stuck in seems stuck in this infinite loop. Any idea why this might happen? I had to use the debugger to see all that, and I could not figure out how to see why some steps where rejected or not, because in Metropolis.py the debugger jumps a little erratically. (hence the need to see all info at all steps :slight_smile: )

@twiecki Thank you, I actually had found your very informative post indeed, as well as many others but I guess after trying so many things to make my code work I got maybe confused :slight_smile:

This is usually an indication that Metropolis is inefficient in high dimension, as all the large jump is rejected, the algorithm tuned to taking a very small step, or even completely “stuck”.

You mean outputting all the proposed step? I guess you can modify the Metropolis https://github.com/pymc-devs/pymc3/blob/master/pymc3/step_methods/metropolis.py#L156 to output all the proposed steps and compare it alongside with the trace.

Also, I am not sure the function evaluations you are counting is correct, from the printed output I only see that each step it proposed one new alpha value but printed it out multiple times.

Did you try experimenting with the scaling? Try e.g. pm.Metropolis(scaling=100).

1 Like

Thanks a lot, I have been experimenting the last days with that and I found that I have many Nans of Infs, so I have to deal with that first.
Probably my model is not well defined, I tried changing some Normals to HalfNormals, but I have another problem that will need me opening another question.

Thanks for the suggestion.
This is the parameter I was looking for I guess. I tried and it looks like it indeed extends the search perimeter around the starting value.
I thought that that was the role of the sd variable, or the sigma but I guess not.
As I said to @junpenglao, I have other troubles with my model now, to be able to try that fully, so I will open a new question to address this specific problem.
I will come back here to post results once this is solved.