Mass matrix error with degree of freedom and scale parameters of Student T distribution

I have been working using the following codes to acquire the Bayesian Fusion of StudentT distribution.

Fusion code:

def S2_0_Bayesian_Interface(data):
########################################################################################################################
    with pm.Model() as model_0:
            # Prior Distributions for unknown model parameters:
            nu_0 = pm.HalfNormal('nu_0', sd=1)
            sigma_0 = pm.HalfNormal('sigma_0', sd=5)
            mu_0 = pm.Normal('mu_0', mu=0, sd=5)

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_0 = StudentT('observed_data_0', mu=mu_0, sigma=sigma_0, observed=data)

            # Printing the result of log_likelihood:
            # print('log_likelihood result:', model_0)

            # draw 5000 posterior samples
            trace_0 = pm.sample(draws=1000, tune=1000, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_0 = pm.sample_posterior_predictive(trace_0, samples=1000)
            print(post_pred_0['observed_data_0'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_0))
            print(pm.stats.summary(data=post_pred_0))
########################################################################################################################
    return trace_0, post_pred_0

def S2_P_Bayesian_Interface(nu, mu, sigma, data, trace):
########################################################################################################################
    with pm.Model() as model_P:
            # Prior Distributions for unknown model parameters from posterior distribution:
            nu_P = pm.HalfNormal('nu_P', np.std(nu))
            sigma_P = pm.HalfNormal('sigma_P', sigma=np.std(sigma))
            mu_P = pm.Normal('mu_P', mu=np.mean(mu), sigma=np.std(mu))

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_P = StudentT('observed_data_P', nu=nu_P, mu=mu_P, sigma=sigma_P, observed=data)

            # Printing the result of log_likelihood:
            # print('log_likelihood result:', model_P)

            # draw 5000 posterior samples
            trace_P = pm.sample(draws=1000, tune=1000, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_P = pm.sample_posterior_predictive(trace_P, samples=1000)
            print(post_pred_P['observed_data_P'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_P))
            print(pm.stats.summary(data=post_pred_P))
########################################################################################################################
    return trace_P, post_pred_P

def S1_C_Bayesian_Interface(nu, mu, sigma, data, trace):
########################################################################################################################
    with pm.Model() as model_C:
            # Prior Distributions for unknown model parameters from posterior distribution:
            nu_C = pm.HalfNormal('nu_C', np.std(nu))
            sigma_C = pm.HalfNormal('sigma_C', sigma=np.std(sigma))
            mu_C = pm.Normal('mu_C', mu=np.mean(mu), sigma=np.std(mu))
            

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_C = StudentT('observed_data_C', nu=nu_C, mu=mu_C, sigma=sigma_C, observed=data)
    
            # Printing the result of log_likelihood:
            # print('log_likelihood result:', model_C)

            # draw 5000 posterior samples
            trace_C = pm.sample(draws=1000, tune=1000, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_C = pm.sample_posterior_predictive(trace_C, samples=1000)
            print(post_pred_C['observed_data_C'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_C))
            print(pm.stats.summary(data=post_pred_C))
########################################################################################################################
    return trace_C, post_pred_C

### Main Program:

# Initial parameters:
trace_S2_0, post_pred_S2_0 = S2_0_Bayesian_Interface(data=S2_distribution_list[0])
# get mean and variance of the trace
# use scipy to plot distribution using the code from [fitting empirical distribution](https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python)

for m in range(0, len(S2_distribution_list)):
       # Prediction Stage:
       trace_S2_P, post_pred_S2_P = S2_P_Bayesian_Interface(data=S2_distribution_list[m], nu=trace_S2_0['nu'], mu=trace_S2_0['mu'], sigma=trace_S2_0['sigma'])
       # get mean and variance of the trace
       # use scipy to plot distribution using the code from [fitting empirical distribution](https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python)

       # Correction Stage:
       trace_S1_C, post_pred_S1_C = S1_C_Bayesian_Interface(data=S1_distribution_list[m], nu=trace_S2_P['nu'], mu=trace_S2_P['mu'], sigma=trace_S2_P[''sigma'])
       # get mean and variance of the trace
       # use scipy to plot distribution using the code from [fitting empirical distribution](https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python)

I used the following distributions to be used as prior and the posterior to be used as an input for the fusion part.

prior distributions code:

with pm.Model as model:
            # Prior Distributions for unknown model parameters from posterior distribution:
            sigma = pm.HalfNormal('sigma', sigma=1)
            mu = pm.Normal('mu', mu=0, sigma=1)
            nu = pm.Exponential('nu', 1.0)

            # Observed data is from a likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data = StudentT('observed_data', nu=nu, mu=mu, sigma=sigma, observed=data)
            # Or
            observed_data = Cauchy('observed_data', alpha=mu, beta=sigma, observed=data)
            # Printing the result of log_likelihood:
            # print('log_likelihood result:', model)
            
            # draw 5000 posterior samples
            trace = pm.sample(draws=1000, tune=1000, chains=3, cores=1, progressbar=True)

            # Obtaining Posterior Predictive Sampling:
            post_pred = pm.sample_posterior_predictive(trace, samples=1000)
            print(post_pred['observed_data'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace))
            print(pm.stats.summary(data=post_pred))
########################################################################################################################
    return trace, post_pred

However, when it enters the fusion part the scale and location parameters are around decimal places like (Ex: 0.00xxx, xx.x*10-x) where it causes the error which can be found below.

Error:

ValueError: Mass matrix contains zeros on the diagonal. 
The derivative of RV `sigma_P_log__`.ravel()[0] is zero.
The derivative of RV `nu_P_log__`.ravel()[0] is zero.

I found this presentation StudentT distribution where looks at the location and scale parameters . Hence, is there a way or standard method in order to improve/scale the location and scale parameters of StudentT distribution and/or Cauchy distribution in roder to solve the error mentioned above and to keep the parameter values around real numbers?

I found out this question Mass Matrix error where someone answered that init parameter in pm.sample need to be pm.sample(init='adapt_diag', so, my question here is what is the functionality of ‘init’ in the sample?

init specifies what strategy is used to initialize the NUTS kernel. We have over a dozen or so different strategies, which you can read about in the docs: Inference — PyMC3 3.11.4 documentation

Thank you for the reply. Well, I tried to use the adapt_diag but it always keeps on giving me mass matrix error around the degree of freedom and scale when it enters data fusion part.

So, is there a way I could get the posterior where my parameters are around the range of (Ex: 0.00xxx, xx.x*10-x)?

Well, I would like to know two things:

  • What would be suitable to use as init in sample part for the posterior traces when I have the case where my scale, location and degree of freedom parameters around the range of range of (Ex: 0.00xxx, xx.x*10-x)?
  • What range of values I should use in the priors where my scale, location and degree of freedom parameters around the range of range of (Ex: 0.00xxx, xx.x*10-x)?