Mass matrix error with degree of freedom and scale parameters of Student T distribution

I have been working using the following codes to acquire the Bayesian Fusion of StudentT distribution.

Fusion code:

def S2_0_Bayesian_Interface(data):
########################################################################################################################
    with pm.Model() as model_0:
            # Prior Distributions for unknown model parameters:
            nu_0 = pm.HalfNormal('nu_0', sd=1)
            sigma_0 = pm.HalfNormal('sigma_0', sd=5)
            mu_0 = pm.Normal('mu_0', mu=0, sd=5)

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_0 = StudentT('observed_data_0', mu=mu_0, sigma=sigma_0, observed=data)

            # Printing the result of log_likelihood:
            # print('log_likelihood result:', model_0)

            # draw 5000 posterior samples
            trace_0 = pm.sample(draws=1000, tune=1000, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_0 = pm.sample_posterior_predictive(trace_0, samples=1000)
            print(post_pred_0['observed_data_0'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_0))
            print(pm.stats.summary(data=post_pred_0))
########################################################################################################################
    return trace_0, post_pred_0

def S2_P_Bayesian_Interface(nu, mu, sigma, data, trace):
########################################################################################################################
    with pm.Model() as model_P:
            # Prior Distributions for unknown model parameters from posterior distribution:
            nu_P = pm.HalfNormal('nu_P', np.std(nu))
            sigma_P = pm.HalfNormal('sigma_P', sigma=np.std(sigma))
            mu_P = pm.Normal('mu_P', mu=np.mean(mu), sigma=np.std(mu))

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_P = StudentT('observed_data_P', nu=nu_P, mu=mu_P, sigma=sigma_P, observed=data)

            # Printing the result of log_likelihood:
            # print('log_likelihood result:', model_P)

            # draw 5000 posterior samples
            trace_P = pm.sample(draws=1000, tune=1000, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_P = pm.sample_posterior_predictive(trace_P, samples=1000)
            print(post_pred_P['observed_data_P'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_P))
            print(pm.stats.summary(data=post_pred_P))
########################################################################################################################
    return trace_P, post_pred_P

def S1_C_Bayesian_Interface(nu, mu, sigma, data, trace):
########################################################################################################################
    with pm.Model() as model_C:
            # Prior Distributions for unknown model parameters from posterior distribution:
            nu_C = pm.HalfNormal('nu_C', np.std(nu))
            sigma_C = pm.HalfNormal('sigma_C', sigma=np.std(sigma))
            mu_C = pm.Normal('mu_C', mu=np.mean(mu), sigma=np.std(mu))
            

            # Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data_C = StudentT('observed_data_C', nu=nu_C, mu=mu_C, sigma=sigma_C, observed=data)
    
            # Printing the result of log_likelihood:
            # print('log_likelihood result:', model_C)

            # draw 5000 posterior samples
            trace_C = pm.sample(draws=1000, tune=1000, chains=3, cores=1, progressbar=True)
    
            # Obtaining Posterior Predictive Sampling:
            post_pred_C = pm.sample_posterior_predictive(trace_C, samples=1000)
            print(post_pred_C['observed_data_C'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace_C))
            print(pm.stats.summary(data=post_pred_C))
########################################################################################################################
    return trace_C, post_pred_C

### Main Program:

# Initial parameters:
trace_S2_0, post_pred_S2_0 = S2_0_Bayesian_Interface(data=S2_distribution_list[0])
# get mean and variance of the trace
# use scipy to plot distribution using the code from [fitting empirical distribution](https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python)

for m in range(0, len(S2_distribution_list)):
       # Prediction Stage:
       trace_S2_P, post_pred_S2_P = S2_P_Bayesian_Interface(data=S2_distribution_list[m], nu=trace_S2_0['nu'], mu=trace_S2_0['mu'], sigma=trace_S2_0['sigma'])
       # get mean and variance of the trace
       # use scipy to plot distribution using the code from [fitting empirical distribution](https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python)

       # Correction Stage:
       trace_S1_C, post_pred_S1_C = S1_C_Bayesian_Interface(data=S1_distribution_list[m], nu=trace_S2_P['nu'], mu=trace_S2_P['mu'], sigma=trace_S2_P[''sigma'])
       # get mean and variance of the trace
       # use scipy to plot distribution using the code from [fitting empirical distribution](https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python)

I used the following distributions to be used as prior and the posterior to be used as an input for the fusion part.

prior distributions code:

with pm.Model as model:
            # Prior Distributions for unknown model parameters from posterior distribution:
            sigma = pm.HalfNormal('sigma', sigma=1)
            mu = pm.Normal('mu', mu=0, sigma=1)
            nu = pm.Exponential('nu', 1.0)

            # Observed data is from a likelihood distributions (Likelihood (sampling distribution) of observations):
            observed_data = StudentT('observed_data', nu=nu, mu=mu, sigma=sigma, observed=data)
            # Or
            observed_data = Cauchy('observed_data', alpha=mu, beta=sigma, observed=data)
            # Printing the result of log_likelihood:
            # print('log_likelihood result:', model)
            
            # draw 5000 posterior samples
            trace = pm.sample(draws=1000, tune=1000, chains=3, cores=1, progressbar=True)

            # Obtaining Posterior Predictive Sampling:
            post_pred = pm.sample_posterior_predictive(trace, samples=1000)
            print(post_pred['observed_data'].shape)
            print('\nSummary: ')
            print(pm.stats.summary(data=trace))
            print(pm.stats.summary(data=post_pred))
########################################################################################################################
    return trace, post_pred

However, when it enters the fusion part the scale and location parameters are around decimal places like (Ex: 0.00xxx, xx.x*10-x) where it causes the error which can be found below.

Error:

ValueError: Mass matrix contains zeros on the diagonal. 
The derivative of RV `sigma_P_log__`.ravel()[0] is zero.
The derivative of RV `nu_P_log__`.ravel()[0] is zero.

I found this presentation StudentT distribution where looks at the location and scale parameters . Hence, is there a way or standard method in order to improve/scale the location and scale parameters of StudentT distribution and/or Cauchy distribution in roder to solve the error mentioned above and to keep the parameter values around real numbers?

I found out this question Mass Matrix error where someone answered that init parameter in pm.sample need to be pm.sample(init='adapt_diag', so, my question here is what is the functionality of ‘init’ in the sample?

init specifies what strategy is used to initialize the NUTS kernel. We have over a dozen or so different strategies, which you can read about in the docs: Inference — PyMC3 3.11.4 documentation

Thank you for the reply. Well, I tried to use the adapt_diag but it always keeps on giving me mass matrix error around the degree of freedom and scale when it enters data fusion part.

So, is there a way I could get the posterior where my parameters are around the range of (Ex: 0.00xxx, xx.x*10-x)?

Well, I would like to know two things:

  • What would be suitable to use as init in sample part for the posterior traces when I have the case where my scale, location and degree of freedom parameters around the range of range of (Ex: 0.00xxx, xx.x*10-x)?
  • What range of values I should use in the priors where my scale, location and degree of freedom parameters around the range of range of (Ex: 0.00xxx, xx.x*10-x)?

I came across the following link: Bayesian recipes (3): Simple, efficient statistics for Normal distributions in the link, I saw the part where the author says:

Focusing on the mean:
This Bayesian recipe is a little more complicated than some others in this series because the frequency distribution has two parameters (the mean and standard deviation) rather than just one. The Bayesian calculations give a surface showing how likely you think each possible combination of mean and standard deviation is, given the data and your initial beliefs. However, this is often just too much information! Usually it is the mean that we care about most. To understand how our beliefs about the true mean should be shaped we need to somehow sum all the possibilities over all the potential standard deviations. Fortunately, there is a well known distribution that does just that. It’s called Student’s T. Student’s T distribution has three parameters, called df (degrees of freedom), mu, and sigma. These can be calculated from the parameters of the normal-gamma using these simple formulae:
df = 2 × alpha
mu = mu (yes that’s right, it’s the same number for both distributions
sigma = beta / (alpha × kappa)

I would like to know if it’s possible to use equations for mu and sigma? I have been looking for articles for getting mu, sigma, beta and alpha.