Debugging models when sampling is stuck at 0%

I’m having problems with sampling my model, the sampling simply won’t start and the progress bar is stuck at 0%. I’ve tried using both NUTS and Metropolis and neither works. The find_MAP method works fine and the MAP parameters are sensible.

I’d like to know how to debug these types of problems, at the moment I’m printing logp for all of the parameters and the initial log posterior. I’ve also tried profiling logp and its gradients using model.profile. It all looks fine.

Here’s the model output:

Optimizing for MAP parameters:
success: True
initial logp: -586.9534773212665
final logp: -286.5252023190529
{'t0_interval__': array(1.47898696), 'ln_teff_ln_tE': array([2.97053136, 4.88569092]), 'K_lowerbound__': array(0.05436406), 't0': array(7871.48319146), 'tE': array(132.38189842), 'u0': array(0.14731833), 'K': array(2.05586893), 'logp_t0': array(-6.89936818), 'logp_u0': array(-0.92978988), 'logp_tE': array(-7.34020842), 'logp_K': array(-1.75144311), 'log_posterior': array(-0.76934081)}

Sampling using NUTS:
Sampling 4 chains:   0%|                                                                                                                                            | 0/808 [00:00<?, ?draws/s]

Seems like it is a multi-chain sampling problem - try doing trace = pm.sample(..., cores=1)

2 Likes

Indeed it is. It works fine with cores=1. Do you have any idea how to fix this? The difference between this model and others that I’ve sampled successfully with multiple chains is minimal, not sure why it’s suddenly failing.

Hmmm so it works on other models? That could be a memory issue.

Are you thinking that the call to pickle, which we use for the multiprocessing, is what does it? I’m honestly not sure whether the observed data is replicated, and might cause such a problem.

Memory usage seems fine, i have 32GB of RAM on my machine.

Here are the two models, the marginalized version fails with multiple cores.

class PointSourcePointLens(pm.Model):
      def __init__(self, data, use_joint_prior=True, name='', model=None):
          super(PointSourcePointLens, self).__init__(name, model)

          # Load and pre-process the data 
          data.convert_data_to_fluxes()
          df = data.get_standardized_data()
          self.t = df['HJD - 2450000'].values
          self.F = df['I_flux'].values
          self.sigF = df['I_flux_err'].values

          # Define bounded prior distributions 
          BoundedNormal = pm.Bound(pm.Normal, lower=0.0) # Delta_F is positive
          BoundedNormal1 = pm.Bound(pm.Normal, lower=1.) 

          # Define model parameters and their associated priors
          self.Delta_F = BoundedNormal('Delta_F', mu=np.max(self.F), sd=1., testval=3.)
          self.F_base = pm.Normal('F_base', mu=0., sd=0.1, testval=0.)

          ## Posterior is multi-modal in t0 and it's critical that the it is 
          ## initialized near the true value
          t0_guess_idx = (np.abs(self.F - np.max(self.F))).argmin() 
          self.t0 = pm.Uniform('t0', self.t[0], self.t[-1], 
              testval=self.t[t0_guess_idx])

          self.ln_teff_ln_tE = pm.DensityDist('ln_teff_ln_tE', 
              self.prior_ln_teff_ln_tE, shape=2, 
              testval = [np.log(10), np.log(20.)]) # p(ln_teff,ln_tE)

          # Deterministic transformations
          self.tE = pm.Deterministic("tE", T.exp(self.ln_teff_ln_tE[1]))
          self.u0 = pm.Deterministic("u0", 
              T.exp(self.ln_teff_ln_tE[0])/self.tE) 

          # Noise model parameters
          self.K = BoundedNormal1('K', mu=1., sd=2., testval=1.5)

          # Define helpful class attributes
          self.free_parameters = [RV.name for RV in self.basic_RVs]
          self.initial_logps = [RV.logp(self.test_point) for RV in self.basic_RVs]

          # Define the likelihood function~
          Y_obs = pm.Normal('Y_obs', mu=self.mean_function(), sd=self.K*self.sigF, 
              observed=self.F, shape=len(self.F))

      def mean_function(self):
          """Return the mean function which goes into the likeliood."""
          u = T.sqrt(self.u0**2 + ((self.t - self.t0)/self.tE)**2)
          A = lambda u: (u**2 + 2)/(u*T.sqrt(u**2 + 4))

          return self.Delta_F*(A(u) - 1)/(A(self.u0) - 1) + self.F_base

      def prior_ln_teff_ln_tE(self, value):
          """Returns the log of a custom joint prior p(ln_Delta_F, ln_F_base)."""
          teff = T.cast(T.exp(value[0]), 'float64')
          tE = T.cast(T.exp(value[1]), 'float64')
          sig_tE = T.cast(365., 'float64') # p(tE)~N(0, 600)
          sig_u0 = T.cast(1., 'float64') # p(u0)~N(0, 1)

          return -T.log(tE) - (teff/tE)**2/sig_u0**2 - tE**2/sig_tE**2 +\
              value[0] + value[1]

      def peak_mag(self):
          """Returns PSPL magnification at u=u0."""
          u = T.sqrt(self.u0**2 + ((self.t - self.t0)/self.tE)**2)
          A = lambda u: (u**2 + 2)/(u*T.sqrt(u**2 + 4))

          return A(self.u0)```

class PointSourcePointLensMarginalized(pm.Model):
      def __init__(self, data, use_joint_prior=True, name='', model=None):
          super(PointSourcePointLensMarginalized, self).__init__(name, model)

          # Load and pre-process the data 
          data.convert_data_to_fluxes()
          df = data.get_standardized_data()
          self.t = df['HJD - 2450000'].values
          self.F = df['I_flux'].values
          self.sigF = df['I_flux_err'].values

          # Define bounded prior distributions 
          BoundedNormal = pm.Bound(pm.Normal, lower=0.0) 
          BoundedNormal1 = pm.Bound(pm.Normal, lower=1.) 

          # Define model parameters and their associated priors
          ## Posterior is multi-modal in t0 and it's critical that the it is 
          ## initialized near the true value
          t0_guess_idx = (np.abs(self.F - np.max(self.F))).argmin() 
          self.t0 = pm.Uniform('t0', self.t[0], self.t[-1], 
              testval=self.t[t0_guess_idx])

          self.ln_teff_ln_tE = pm.DensityDist('ln_teff_ln_tE', 
              self.prior_ln_teff_ln_tE, shape=2, 
              testval = [np.log(10), np.log(20.)]) # p(ln_teff,ln_tE)

          # Deterministic transformations
          self.tE = pm.Deterministic("tE", T.exp(self.ln_teff_ln_tE[1]))
          self.u0 = pm.Deterministic("u0", 
              T.exp(self.ln_teff_ln_tE[0])/self.tE) 

          # Noise model parameters
          self.K = BoundedNormal1('K', mu=1., sd=2., testval=1.5)

          # Define helpful class attributes
          self.free_parameters = [RV.name for RV in self.basic_RVs]
          self.initial_logps = [RV.logp(self.test_point) for RV in self.basic_RVs]

          # Define the likelihood function~
          pm.Potential('likelihood', self.marginalized_likelihood())

      def marginalized_likelihood(self):
          """Gaussian likelihood function marginalized over the linear parameters.
          """
          N = len(self.F)        
          F = T._shared(self.F)

          # Linear parameter matrix
          mag_vector = self.magnification()
          A = T.stack([mag_vector, T.ones(N)], axis=1)

          # Covariance matrix
          C_diag = T.pow(self.K*T._shared(self.sigF), 2.)
          C = T.nlinalg.diag(C_diag)

          # Prior matrix
          sigDelta_F = 10.
          sigF_base = 0.1
          L_diag = T._shared(np.array([sigDelta_F, sigF_base])**2.)
          L = T.nlinalg.diag(L_diag)

          # Calculate inverse of covariance matrix for marginalized likelihood
          inv_C = T.nlinalg.diag(T.pow(C_diag, -1.))
          inv_L = T.nlinalg.diag(T.pow(L_diag, -1.))
          term1 = T.dot(A.transpose(), inv_C) 
          term2 = inv_L + T.dot(A.transpose(), T.dot(inv_C, A))
          term3 = T.dot(inv_C, A)
          inv_SIGMA = inv_C - T.dot(term3, T.dot(T.nlinalg.matrix_inverse(term2),
               term1))

          # Calculate determinant of covariance matrix for marginalized likelihood
          det_C = C_diag.prod() 
          det_L = L_diag.prod() 
          det_SIGMA = det_C*det_L*T.nlinalg.det(term2)

          # Calculate marginalized likelihood
          return -0.5*T.dot(F.transpose(), T.dot(inv_SIGMA, F)) -\
                 0.5*N*np.log(2*np.pi) - 0.5*np.log(det_SIGMA)

      def magnification(self):
          """Return the mean function which goes into the likeliood."""
          u = T.sqrt(self.u0**2 + ((self.t - self.t0)/self.tE)**2)
          A = lambda u: (u**2 + 2)/(u*T.sqrt(u**2 + 4))

          return (A(u) - 1)/(A(self.u0) - 1) 

      def prior_ln_teff_ln_tE(self, value):
          """Returns the log of a custom joint prior p(ln_teff, ln_tE)."""
          teff = T.cast(T.exp(value[0]), 'float64')
          tE = T.cast(T.exp(value[1]), 'float64')
          sig_tE = T.cast(365., 'float64') # p(tE)~N(0, 600)
          sig_u0 = T.cast(1., 'float64') # p(u0)~N(0, 1)

          return -T.log(tE) - (teff/tE)**2/sig_u0**2 - tE**2/sig_tE**2 +\
              value[0] + value[1]

      def peak_mag(self):
          """Returns PSPL magnification at u=u0."""
          u = T.sqrt(self.u0**2 + ((self.t - self.t0)/self.tE)**2)
          A = lambda u: (u**2 + 2)/(u*T.sqrt(u**2 + 4))

          return A(self.u0)```

There’s a chance the problem is with the lambda's in some of those methods, though I’d expect an Exception, rather than just hanging.

As an aside, subclassing Model is very cool!