Debugging models when sampling is stuck at 0%

fbartolic · February 19, 2019, 11:43am

I’m having problems with sampling my model, the sampling simply won’t start and the progress bar is stuck at 0%. I’ve tried using both NUTS and Metropolis and neither works. The find_MAP method works fine and the MAP parameters are sensible.

I’d like to know how to debug these types of problems, at the moment I’m printing logp for all of the parameters and the initial log posterior. I’ve also tried profiling logp and its gradients using model.profile. It all looks fine.

Here’s the model output:

Optimizing for MAP parameters:
success: True
initial logp: -586.9534773212665
final logp: -286.5252023190529
{'t0_interval__': array(1.47898696), 'ln_teff_ln_tE': array([2.97053136, 4.88569092]), 'K_lowerbound__': array(0.05436406), 't0': array(7871.48319146), 'tE': array(132.38189842), 'u0': array(0.14731833), 'K': array(2.05586893), 'logp_t0': array(-6.89936818), 'logp_u0': array(-0.92978988), 'logp_tE': array(-7.34020842), 'logp_K': array(-1.75144311), 'log_posterior': array(-0.76934081)}

Sampling using NUTS:
Sampling 4 chains:   0%|                                                                                                                                            | 0/808 [00:00<?, ?draws/s]

junpenglao · February 19, 2019, 5:26pm

Seems like it is a multi-chain sampling problem - try doing trace = pm.sample(..., cores=1)

fbartolic · February 19, 2019, 6:15pm

Indeed it is. It works fine with cores=1. Do you have any idea how to fix this? The difference between this model and others that I’ve sampled successfully with multiple chains is minimal, not sure why it’s suddenly failing.

junpenglao · February 19, 2019, 7:59pm

Hmmm so it works on other models? That could be a memory issue.

colcarroll · February 19, 2019, 8:11pm

Are you thinking that the call to pickle, which we use for the multiprocessing, is what does it? I’m honestly not sure whether the observed data is replicated, and might cause such a problem.

fbartolic · February 20, 2019, 10:03am

Memory usage seems fine, i have 32GB of RAM on my machine.

Here are the two models, the marginalized version fails with multiple cores.

class PointSourcePointLens(pm.Model):
      def __init__(self, data, use_joint_prior=True, name='', model=None):
          super(PointSourcePointLens, self).__init__(name, model)

          # Load and pre-process the data 
          data.convert_data_to_fluxes()
          df = data.get_standardized_data()
          self.t = df['HJD - 2450000'].values
          self.F = df['I_flux'].values
          self.sigF = df['I_flux_err'].values

          # Define bounded prior distributions 
          BoundedNormal = pm.Bound(pm.Normal, lower=0.0) # Delta_F is positive
          BoundedNormal1 = pm.Bound(pm.Normal, lower=1.) 

          # Define model parameters and their associated priors
          self.Delta_F = BoundedNormal('Delta_F', mu=np.max(self.F), sd=1., testval=3.)
          self.F_base = pm.Normal('F_base', mu=0., sd=0.1, testval=0.)

          ## Posterior is multi-modal in t0 and it's critical that the it is 
          ## initialized near the true value
          t0_guess_idx = (np.abs(self.F - np.max(self.F))).argmin() 
          self.t0 = pm.Uniform('t0', self.t[0], self.t[-1], 
              testval=self.t[t0_guess_idx])

          self.ln_teff_ln_tE = pm.DensityDist('ln_teff_ln_tE', 
              self.prior_ln_teff_ln_tE, shape=2, 
              testval = [np.log(10), np.log(20.)]) # p(ln_teff,ln_tE)

          # Deterministic transformations
          self.tE = pm.Deterministic("tE", T.exp(self.ln_teff_ln_tE[1]))
          self.u0 = pm.Deterministic("u0", 
              T.exp(self.ln_teff_ln_tE[0])/self.tE) 

          # Noise model parameters
          self.K = BoundedNormal1('K', mu=1., sd=2., testval=1.5)

          # Define helpful class attributes
          self.free_parameters = [RV.name for RV in self.basic_RVs]
          self.initial_logps = [RV.logp(self.test_point) for RV in self.basic_RVs]

          # Define the likelihood function~
          Y_obs = pm.Normal('Y_obs', mu=self.mean_function(), sd=self.K*self.sigF, 
              observed=self.F, shape=len(self.F))

      def mean_function(self):
          """Return the mean function which goes into the likeliood."""
          u = T.sqrt(self.u0**2 + ((self.t - self.t0)/self.tE)**2)
          A = lambda u: (u**2 + 2)/(u*T.sqrt(u**2 + 4))

          return self.Delta_F*(A(u) - 1)/(A(self.u0) - 1) + self.F_base

      def prior_ln_teff_ln_tE(self, value):
          """Returns the log of a custom joint prior p(ln_Delta_F, ln_F_base)."""
          teff = T.cast(T.exp(value[0]), 'float64')
          tE = T.cast(T.exp(value[1]), 'float64')
          sig_tE = T.cast(365., 'float64') # p(tE)~N(0, 600)
          sig_u0 = T.cast(1., 'float64') # p(u0)~N(0, 1)

          return -T.log(tE) - (teff/tE)**2/sig_u0**2 - tE**2/sig_tE**2 +\
              value[0] + value[1]

      def peak_mag(self):
          """Returns PSPL magnification at u=u0."""
          u = T.sqrt(self.u0**2 + ((self.t - self.t0)/self.tE)**2)
          A = lambda u: (u**2 + 2)/(u*T.sqrt(u**2 + 4))

          return A(self.u0)```

class PointSourcePointLensMarginalized(pm.Model):
      def __init__(self, data, use_joint_prior=True, name='', model=None):
          super(PointSourcePointLensMarginalized, self).__init__(name, model)

          # Load and pre-process the data 
          data.convert_data_to_fluxes()
          df = data.get_standardized_data()
          self.t = df['HJD - 2450000'].values
          self.F = df['I_flux'].values
          self.sigF = df['I_flux_err'].values

          # Define bounded prior distributions 
          BoundedNormal = pm.Bound(pm.Normal, lower=0.0) 
          BoundedNormal1 = pm.Bound(pm.Normal, lower=1.) 

          # Define model parameters and their associated priors
          ## Posterior is multi-modal in t0 and it's critical that the it is 
          ## initialized near the true value
          t0_guess_idx = (np.abs(self.F - np.max(self.F))).argmin() 
          self.t0 = pm.Uniform('t0', self.t[0], self.t[-1], 
              testval=self.t[t0_guess_idx])

          self.ln_teff_ln_tE = pm.DensityDist('ln_teff_ln_tE', 
              self.prior_ln_teff_ln_tE, shape=2, 
              testval = [np.log(10), np.log(20.)]) # p(ln_teff,ln_tE)

          # Deterministic transformations
          self.tE = pm.Deterministic("tE", T.exp(self.ln_teff_ln_tE[1]))
          self.u0 = pm.Deterministic("u0", 
              T.exp(self.ln_teff_ln_tE[0])/self.tE) 

          # Noise model parameters
          self.K = BoundedNormal1('K', mu=1., sd=2., testval=1.5)

          # Define helpful class attributes
          self.free_parameters = [RV.name for RV in self.basic_RVs]
          self.initial_logps = [RV.logp(self.test_point) for RV in self.basic_RVs]

          # Define the likelihood function~
          pm.Potential('likelihood', self.marginalized_likelihood())

      def marginalized_likelihood(self):
          """Gaussian likelihood function marginalized over the linear parameters.
          """
          N = len(self.F)        
          F = T._shared(self.F)

          # Linear parameter matrix
          mag_vector = self.magnification()
          A = T.stack([mag_vector, T.ones(N)], axis=1)

          # Covariance matrix
          C_diag = T.pow(self.K*T._shared(self.sigF), 2.)
          C = T.nlinalg.diag(C_diag)

          # Prior matrix
          sigDelta_F = 10.
          sigF_base = 0.1
          L_diag = T._shared(np.array([sigDelta_F, sigF_base])**2.)
          L = T.nlinalg.diag(L_diag)

          # Calculate inverse of covariance matrix for marginalized likelihood
          inv_C = T.nlinalg.diag(T.pow(C_diag, -1.))
          inv_L = T.nlinalg.diag(T.pow(L_diag, -1.))
          term1 = T.dot(A.transpose(), inv_C) 
          term2 = inv_L + T.dot(A.transpose(), T.dot(inv_C, A))
          term3 = T.dot(inv_C, A)
          inv_SIGMA = inv_C - T.dot(term3, T.dot(T.nlinalg.matrix_inverse(term2),
               term1))

          # Calculate determinant of covariance matrix for marginalized likelihood
          det_C = C_diag.prod() 
          det_L = L_diag.prod() 
          det_SIGMA = det_C*det_L*T.nlinalg.det(term2)

          # Calculate marginalized likelihood
          return -0.5*T.dot(F.transpose(), T.dot(inv_SIGMA, F)) -\
                 0.5*N*np.log(2*np.pi) - 0.5*np.log(det_SIGMA)

      def magnification(self):
          """Return the mean function which goes into the likeliood."""
          u = T.sqrt(self.u0**2 + ((self.t - self.t0)/self.tE)**2)
          A = lambda u: (u**2 + 2)/(u*T.sqrt(u**2 + 4))

          return (A(u) - 1)/(A(self.u0) - 1) 

      def prior_ln_teff_ln_tE(self, value):
          """Returns the log of a custom joint prior p(ln_teff, ln_tE)."""
          teff = T.cast(T.exp(value[0]), 'float64')
          tE = T.cast(T.exp(value[1]), 'float64')
          sig_tE = T.cast(365., 'float64') # p(tE)~N(0, 600)
          sig_u0 = T.cast(1., 'float64') # p(u0)~N(0, 1)

          return -T.log(tE) - (teff/tE)**2/sig_u0**2 - tE**2/sig_tE**2 +\
              value[0] + value[1]

      def peak_mag(self):
          """Returns PSPL magnification at u=u0."""
          u = T.sqrt(self.u0**2 + ((self.t - self.t0)/self.tE)**2)
          A = lambda u: (u**2 + 2)/(u*T.sqrt(u**2 + 4))

          return A(self.u0)```

colcarroll · February 21, 2019, 8:31pm

There’s a chance the problem is with the lambda's in some of those methods, though I’d expect an Exception, rather than just hanging.

As an aside, subclassing Model is very cool!

Topic		Replies	Views
Sampling doesn't start when njobs > 1 for some models Questions	20	4786	January 17, 2020
Sample method of pymc3 Questions	4	682	July 15, 2018
Shape error after pm.math.dot in multivar logistic regression model Questions	1	664	September 12, 2019
Sampler (NUTS) proposes the same value Questions	11	725	April 27, 2018
Switch point Metropolis tuning Questions	25	4899	May 19, 2018

Debugging models when sampling is stuck at 0%

Related topics