PyMC3-Sampling not taking place for the entire range of values between the given bounds of a Uniform prior

I am using PyMC3 for parameter estimation using a particular likelihood function which has to be defined. Here L is the analytic form of my Likelihood function. I have some observational data for the radial velocity(vr) and postion ® for some objects, which is imported from excel file. My priors are M and beta which is assumed to be a uniform distribution.

import pymc3 as pm
from pymc3 import find_MAP
import theano
import theano.tensor as tt
import pandas
import random as rm
import decimal as dm
data_ = np.array(pandas.read_excel('aaa.xlsx',header=None))
gamma=3.77;

G = 4.302*10**-6;
rmin = 3.0;
R = 95.7;
basic_model = pm.Model()
with basic_model:
    M=pm.Uniform('M', lower=0.5, upper=32.6)
    beta=pm.Uniform('beta',lower=2.001,upper=2.9)
    M_print = tt.printing.Print('M')(M) #for debugging
    bet_print = tt.printing.Print('beta')(beta) #for debugging
    gamma=3.77
    vr=data_[:,1];
    r= data_[:,0];
    q =( gamma/(beta - 2)) - 3/2
    B = (G*M*10**12)/((beta -2 )*( R**(3 - beta)))
    K = (gamma - 3)/((rmin**(3 - gamma))*((2*B)**0.5))
    logp= -tt.log(K*((1 -(( 1/(2*B) )*((vr**2)*r**(beta - 2))))**(q+1))*(r**(1-gamma +(beta/2))))
    def logp_func(r,vr):
        return  tt.sum(logp)  
    logpvar = pm.DensityDist("logpvar", logp_func, observed={"r": r,"vr":vr})
    start = pm.find_MAP(model=basic_model)
    step = pm.NUTS()
    basicmodeltrace = pm.sample(100,step=step,start=start,tune=0, chains=1, progressbar=False)
    print(pm.summary(basicmodeltrace))
    map_estimate = pm.find_MAP(model=basic_model)
    print(map_estimate)

I got the following output:

      M __str__ = 16.55
      beta __str__ = 2.5
      logp = 850.88, ||grad|| = 0.0040978: 100%|██████████| 21/21 [00:00<00:00, 582.49it/s]  
      C:\Users\mypc\Anaconda3\Anaconda_3\lib\site-
      packages\pymc3\step_methods\hmc\nuts.py:429: 
      UserWarning: Chain 0 contains only 100 samples.
      % (self._chain_id, n))

       M:

      Mean             SD               MC Error         95% HPD interval
      -------------------------------------------------------------------

       31.591           1.159            0.116            [29.549, 32.600]

       Posterior quantiles:

      2.5            25             50             75             97.5
       |--------------|==============|==============|--------------|

       28.400         31.255         31.788         32.445         32.600


        beta:

       Mean             SD               MC Error         95% HPD interval
       -------------------------------------------------------------------

        2.001            0.000            0.000            [2.001, 2.001]

      Posterior quantiles:

      2.5            25             50             75             97.5
      |--------------|==============|==============|--------------|

    2.001          2.001          2.001          2.001          2.001

   None
  logp = 850.88, ||grad|| = 0.0040978: 100%|██████████| 21/21 [00:00<00:00, 511.86it/s]  
  {'M_interval__': array(23.47882197665113), 'beta_interval__': array(-21.918992363596654), 'M': 
  array(32.6), 'beta': array(2.001000000271933)}

I printed the M and beta values by basicmodeltrace['M'] and basicmodeltrace['beta'] command.
The output was:
for beta:

    array([ 2.001     ,  2.001     ,  2.001     ,  2.00100002,  2.00100002,
    2.00100007,........ 2.00105687,  2.00101437,  2.0010741 ])

for M:

    array([ 32.6       ,  32.6       ,  32.6       ,  32.6       ,
   32.6       , ,  32.58185367,  32.5394849 ,
   32.58355333,.......  32.11210355,  31.36900279])

Debugging the model returned me only one set of M and beta values . In fact it doesn’t match the number of iterations.

For more clarity I plotted a 2D posterior sns.kdeplot(basicmodeltrace['M'],basicmodeltrace['beta'])

The figure clearly shows the logp is calculated only for a very small range of M and beta (priors). I need to do the sampling for all possible range of values between the lower and upper bounds of the priors. What I am getting is the posterior of roughly the same number (lower bound number of beta and upper bound number of M). Situation is the same for large number of samples. I don’t understand why this is happening.

Reading from the output it seems the PyMC3 version you are using is a bit old. Could you please first try updating to the newest version, and then replacing:

    start = pm.find_MAP(model=basic_model)
    step = pm.NUTS()
    basicmodeltrace = pm.sample(100,step=step,start=start,tune=0, chains=1, progressbar=False)

into

    basicmodeltrace = pm.sample(1000, tune=1000)

It would be great if you can paste the output of pm.traceplot(basicmodeltrace) here.

If problem still presist, please upload the data file somewhere and I will have a more indepty look

I have tried replacing into the form you told. The problem still persists. The output of pm.traceplot(basicmodeltrace) is below:

And the pm.summary(basicmodeltrace) is below:

      M __str__ = 16.55
      beta __str__ = 2.5
      Auto-assigning NUTS sampler...
      Initializing NUTS using jitter+adapt_diag...
      100%|██████████| 2000/2000 [00:09<00:00, 203.32it/s]

      M:

        Mean             SD               MC Error         95% HPD interval
         -------------------------------------------------------------------

       31.510           1.064            0.035            [29.247, 32.600]

        Posterior quantiles:
        2.5            25             50             75             97.5
      |--------------|==============|==============|--------------|

        28.732         31.035         31.882         32.302         32.569


     beta:

        Mean             SD               MC Error         95% HPD interval
       -------------------------------------------------------------------

         2.105            0.006            0.000            [2.100, 2.116]

         Posterior quantiles:
         2.5            25             50             75             97.5
        |--------------|==============|==============|--------------|

        2.100          2.101          2.103          2.107          2.119

      None
     logp = 708.29, ||grad|| = 0.0035327: 100%|██████████| 21/21 [00:00<00:00, 58.35it/s]  
    {'M_interval__': array(15.40044898365082), 'beta_interval__': array(15.752273826144563), 
     'M': array(32.59999342077209), 'beta': array(2.899999884664149)}

This is the dataset I have been working with (converted the excel file into a tex file):

aaa.txt (563 Bytes)

The traces seems fine. Since your prior is Uniform and the likelihood is 2D, you can evaluate the posterior directly using a grid method, and validated it with the MCMC sample:

trace = pm.sample(1000, tune=1000, njobs=4, model=basic_model)

def cust_logp(z):
    M = z[0]
    beta = z[1]
    q = gamma/(beta - 2) - 3/2
    B = (G*M*10**12)/((beta-2)*(R**(3 - beta)))
    K = (gamma - 3)/((rmin**(3 - gamma))*((2*B)**0.5))
    logp = -np.log(K*((1 -(( 1/(2*B) )*((vr**2)*r**(beta - 2))))**(q+1))*(r**(1-gamma +(beta/2))))
    return np.sum(logp)

grid = np.mgrid[32.6:.5:100j, 2.001:2.9:100j]
Z = np.asarray([cust_logp(g) for g in grid.reshape(2, -1).T])
top, left = grid[:, 0, 0]
bottom, right = grid[:, -1, -1]
_, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(Z.reshape(100,100), extent=[left, right, bottom, top], aspect='auto');

ax[1].scatter(trace['beta'], trace['M'])
ax[1].set_xlim([2.001,2.9])
ax[1].set_ylim([.5,32.6])

plt.tight_layout();

As you can see, the samples are concentrated in the area with the highest volume (top left). An efficient sampler should sample mostly from the typical set, which in this case it is working as intended.

If this is not the result you have in mind, you should double check your logp function and the prior (e.g., whether the bounds are sensible).

So MCMC wont do sampling throughout all the possible values between the bounds of priors? I want to run the sample throughout the possible values of M and beta and to minimise the log-likelihood function and hence find out the corresponding M and beta values . I have done this by maximum likelihood method and I got sensible results. I did bayesian since it gives more accurate and acceptible solution. From my outputs and printing the posterior I can find that the iterations happen with roughly the same number. Is there any method to change the step size?

I think you might be confusing a few concepts here. I recommend you to read https://arxiv.org/pdf/1701.02434.pdf which explain quite well the idea of the typical set (where you would like the sampler to take samples from), and how MCMC (in the context of the above mention paper, HMC) archive that efficiently.

No. An analogy is that if your parameters is on the real line [-∞, ∞], you would not expect to see samples from -∞ to ∞.

Yes, you can specify the step size for different samplers. But usually, step size are optimized internally during the tune samples, and in your case reducing the step size will not do what you want (i.e., uniformly explore the whole space).

One more thing to add: in your usecase where the parameter is low dimension (2D). Maximum likelihood estimation is usually quite good. However, it is generally not the case in high-dimensions. Again, please see the paper https://arxiv.org/pdf/1701.02434.pdf for more information.

I want to introduce a numerical integration that actually contains one of the parameters beta. I have done it by defining a custom theano op but an error message is coming.

This is the custom theano op i used :-

   class integrateOut(theano.Op):
def __init__(self, f, t, t0, tf,*args, **kwargs):
    super(integrateOut,self).__init__()
    self.f = f
    self.t = t
    self.t0 = t0
    self.tf = tf

def make_node(self, *inputs):
    self.fvars=list(inputs)
    try:
        self.gradF = tt.grad(self.f, self.fvars)
    except:
        self.gradF = None
    return theano.Apply(self, self.fvars, [tt.dscalar().type()])

def perform(self,node, inputs, output_storage):

    args = tuple(inputs)
    f = theano.function([self.t] + self.fvars,self.f,mode='DebugMode')
    output_storage[0][0] = quad(f, self.t0, self.tf, args=args)[0]

def grad(self,inputs,grads):
    return [integrateOut(g, self.t, self.t0, self.tf)(*inputs)*grads[0]\ 
      for g in self.gradF]

and I defined the integration outside my model.

def integ(gamma, beta):
     z = tt.cos(theta)**(2*((gamma/(beta - 2)) - 3/2) + 3)    
     return integrateOut(z, theta, -(np.pi)/2, (np.pi)/2)(beta)
with pm.Model() as basic_model:

M = pm.Uniform('M', lower=10**8, upper=10**13)
beta = pm.Uniform('beta', lower=2.001, upper=2.999)
theta = pm.Normal('theta', 0, 10**2)
M_print = tt.printing.Print('M')(M)
bet_print = tt.printing.Print('beta')(beta)
def logp_func(rn,vn):
    q = (gamma/(beta - 2)) - 3/2
    B = (G*M) / ((beta -2 )*(R**(3 - beta)))
    K = (gamma - 3) / ((rmin**(3 - gamma)) * (2*B)**0.5) * integ(gamma, beta)
    logp = - tt.log(K*((1 -((1/(2*B))*((vn**2)*rn**(beta - 
                    2))))**(q+1))*(rn**(1-gamma +(beta/2))))
    return tt.sum(logp)
logpvar = pm.DensityDist("logpvar", logp_func, observed={"rn": rn,"vn":vn})
trace = pm.sample(1000, tune=1000)
print(pm.summary(trace))
map_estimate = pm.find_MAP(model=basic_model)
print(map_estimate)

But this returns an error message:

  NotImplementedError: input nd
  Apply node that caused the error: InplaceDimShuffle{x}(<__main__.integrateOut object at 
 0x0000008F108681D0>.0)
 Toposort index: 11
 Inputs types: [TensorType(float64, scalar)]
 Inputs shapes: ['No shapes']
 Inputs strides: ['No strides']
 Inputs values: [0.6348844212308064]
 Outputs clients: [[Elemwise{Composite{log(((i0 * i1 * ((i2 - ((i3 * i4 * i5 * i6 * (i7 ** i4)) / i8)) ** i9) * 
  (i7 ** i10)) / i11))}}(TensorConstant{(1,) of 1...9421338119}, InplaceDimShuffle{x}.0, 
 TensorConstant{(1,) of 1.0}, TensorConstant{(1,) of 11..225.011623}, 
  Elemwise{add,no_inplace}.0, Elemwise{Composite{(i0 ** (i1 - i2))}}.0, TensorConstant{[  
  1.26045..73344e+04]}, TensorConstant{[  2.7405 ..  99.528 ]}, InplaceDimShuffle{x}.0, 
  Elemwise{Composite{(i0 + (i1 / i2))}}.0, Elemwise{Composite{(i0 + (i1 * i2))}}.0, 
  Elemwise{Composite{sqrt(((i0 * i1) / (i2 * i3)))}}.0)]]

     HINT: Re-running with most Theano optimization disabled could give you a back-trace of when 
     this node was created. This can be done with by setting the Theano flag 
    'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 
    'optimizer=None'.
    HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map 
    footprint of this apply node.

How to solve this issue?