Yeah, I was following steps I found online and did not notice FAQs about the find_MAP. So, for my case I am looking at continuous distributions, so which one of the start_vals do you think I should choose for my situation?
Have you tried without start vals, i.e letting PyMC automatically do the work for you?
Same thing for the sampler – when you can use NUTS, use it (PyMC will automatically choose the most appropriate for you).
I will give it a try and return back to you
Well, I let it to select NUTS and removed the MAP, but, I get this error:
Autoassigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (3 chains in 1 job)
NUTS: [beta_S1, alpha_S1]
Sampling chain 0, 0 divergences: 0%  0/2000 [00:00<?, ?it/s]
Traceback (most recent call last):
trace_S1 = pm.sample(draws=1500, tune=500, chains=3, cores=1, discard_tuned_samples=True)
trace = _sample_many(**sample_args)
**kwargs
for it, (strace, diverging) in enumerate(sampling):
for obj in iterable:
point, stats = step.step(point)
apoint, stats = self.astep(array)
raise SamplingError("Bad initial energy")
pymc3.exceptions.SamplingError: Bad initial energy
So, Where do I go from here? I know the error could be in the step, but, How can I decide what step to use?
Also, side question, how can I plot multiple traces in one plot? because I am looking at multiple signals and thought of combing all alphas and betas together in one plot.

Bad Initial Energy
can come from several issues. You can look at this entry in the FAQ.  To plot multidimensional variables in a single plot, you can use
az.plot_trace(trace, compact=True)
I know its a basic question, but, how can I obtain the posterior distribution of my model? and say if I used, for example, the code below from the Posterior Predictive Checks link:
_, ax = plt.subplots(figsize=(12, 6))
ax.hist([n.mean() for n in ppc['n']], bins=19, alpha=0.5)
ax.axvline(data.mean())
ax.set(title='Posterior predictive of the mean', xlabel='mean(x)', ylabel='Frequency');
My Code:
sns.distplot(post_pred_S1['observed_data_S1'].mean(axis=1), label='Posterior predictive means of Sensor 1 ', ax=ax)
ax.axvline(np.array(S1_ave[0]).mean(), ls='', color='r', label='True mean of Sensor 1 ')
and found that the data mean is far away from the post pred, what do I need to do to improve the model where the mean is close to post pred?
This is how I create a model:
with pm.Model() as model_S1:
# Prior Distributions for unknown model parameters:
alpha_S1 = pm.HalfNormal('alpha_S1', sigma=10)
beta_S1 = pm.HalfNormal('beta_S1', sigma=1)
# Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
observed_data_S1 = pm.Beta('observed_data_S1', alpha=alpha_S1, beta=beta_S1,
observed=S1_ave)
# instantiate sampler
step_S1= pm.Metropolis(vars=[alpha_S1, beta_S1], model=model_S1) ## Best one
# step = pm.HamiltonianMC()
# step = pm.NUTS()
# draw 5000 posterior samples
trace_S1 = pm.sample(draws=1500, step=step_S1, tune=500, chains=3, cores=1, discard_tuned_samples=True)
# Obtaining Posterior Predictive Sampling:
post_pred_S1 = pm.sample_posterior_predictive(trace_S1, samples=1000)
print(post_pred_S1['observed_data_S1'].shape)
print('\nSummary: ')
print(pm.stats.summary(data=post_pred_S1))
The posterior samples are in trace_S1
, and you can access each parameter’s posterior like in a dict: trace_S1["alpha_S1"]
.
Any reason why you’re using Metropolis and not letting PyMC3 choose for you? It would be NUTS in this case, which is usually recommended.
If you remember from previous replies, the error that I get for bad initial energy is from using NUTS (I already let the program to run with without mentioning the instantiate sampler and I get the bad initial energy), I have a question, is there a way to use MCMC as instantiate sampler?
Also, say that I use the trace samples to get the posterior samples, should I get the mean values of alpha and beta and use them to get the mean and std and use them as loc and scale of distribution to get the beta distribution shape using the scipy package or do I need to a different type of equations to calculate the lock and scale from alpha and beta to use them in the scipy package for beta distribution?
(I found this website: https://www.vosesoftware.com/riskwiki/Betadistribution.php)
Bad Initial Energy
usually denotes a problem with the model, not the sampler. NUTS is robust and stateoftheart, and a good thing is that it spits out warnings when something’s wrong. Switching to an older algorithm, like Metropolis, when you don’t have to doesn’t necessarily mean the problem is gone – it’s just that the sampler doesn’t warn you about it.
So my advice would be to figure out why your model spits out a Bad Initial Energy
– you’ll get a better model in the end. Here, your model uses very wellknown and continuous distributions; this is NUTS kingdom
I’m not sure I understand the question: both Metropolis and NUTS are MCMC algorithms.
There are very good resources to introduce you to the Bayesian workflow and methods. I listed some of them here.
Hope this helps
Well, for my case, I have multiple lists and in the few lists, it was working fine. But, then, in one of the lists, it gives out bad initial energy. So, basically it lies within the list that I am using?
I managed to let the pymc3 auto assign a sampler and was using the NUTS and also managed to get some results. I have another question, if you remember my case where I have around 5 signals, I want to use for example signal 2 as my prior and signal 1 as my likelihood and posterior to be the combination of both (the concept is kind of similar to data fusion where I try to correct the data from sensor 2 with sensor 1) is this possible?
I have been playing with NUTS and it works fine. But, I came with problem, I was looking into the implementing for example histogram data from signals into the NUTS, however, I keep on getting the bad initial energy error. I thought at the beginning that the error is due to the outlier, so, I implement IQR outlier detection in order to remove the outliers. I removed the outliers but still I keep on getting the same error. Could it be because that there are negative values in the histogram?
Also, can you check my previous reply because I have another question?
I’m not sure I understand your goal and question Can you please restate them please? Ideally, can you also post the edited model, the one you say it’s working?
Well, Here is my new model function,
def Bayesian_Interface(data):
###################################################### Outliers ########################################################
S_FLNO = IQR_outlier_detection(data)
# S_FLNO = Stats_outlier_detection(data)
########################################################################################################################
# Bayesian Model:
with pm.Model() as model_S:
# Prior Distributions for unknown model parameters:
alpha_S = pm.HalfNormal('alpha_S', sigma=10)
beta_S = pm.HalfNormal('beta_S', sigma=1)
# Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
observed_data_S = pm.Beta('observed_data_S', alpha=alpha_S, beta=beta_S, observed=S_FLNO)
# draw 5000 posterior samples
trace_S = pm.sample(draws=5000, tune=2000, chains=3, cores=1)
# Obtaining Posterior Predictive Sampling:
post_pred_S = pm.sample_posterior_predictive(trace_S, samples=3000)
print(post_pred_S['observed_data_S'].shape)
print('\nSummary: ')
print(pm.stats.summary(data=trace_S))
print(pm.stats.summary(data=post_pred_S))
return trace_S, post_pred_S
So, I managed to get this function to work with NUTS if I used the pdf data of the signal coming from the algorithm from link from StackOverflow (Link: https://stackoverflow.com/questions/6620471/fittingempiricaldistributiontotheoreticaloneswithscipypython?noredirect=1&lq=1) [first answer], but, I keep on getting bad initial energy error again when I changed the input data into the histogram of signal or peaks of the signal. I thought because of two reasons:
 because of outliers
 because of negative values
So, I implemented the IQR outlier detection algorithm in order to detect and remove any outlier data. It managed to remove the outliers, [I used outlier detection algorithm because some data in the PDF were very high which caused bad initial energy error], but still, I keep on getting the same error. So, my question is
 Could it be, because I am using negative values into the model which also caused the bad initial energy or there could be another reason?
My second question is:
 if you remember my case where I have around 5 signals, I want to use for example signal 2 as my prior and signal 1 as my likelihood and posterior to be the combination of both (the concept is kind of similar to data fusion where I try to correct the data from sensor 2 with sensor 1) is this possible?
Side question:
What could be the reason for this warning?
The number of effective samples is smaller than 25% for some parameters.
Thanks, that’s much clearer.
You’re using a Beta likelihood, which has support on (0, 1), so if you’re observed values go to <= 0 or >= 1, the log probability of these points will be inf and you’ll get bad initial energy.
So maybe the problem is that you’re not using an appropriate likelihood distribution? If your values are on the real lines, the Normal could be more appropriate – of course, this depends on your use case, domain knowledge and scientific context. If the values are too spread out, the good practice is to standardize them, otherwise the sampler is seeing a nasty geometry.
That’s probably why removing “outliers” kind of work: if you remove all the values >= 1, you don’t get inf logp on the right hand side. But I’m very skeptical of these “outlier detections”: the risk is to end up throwing out valuable data to please the model – always better to try and adapt the model to the data instead of the opposite.
I’m not familiar with this concept, so I’m afraid I can’t help you here. This sounds pretty natural to express in the Bayesian framework though: just write down the prior and likelihood in your model.
The sampler is having a hard time exploring the space for one or some parameters – you can usually see which with diagnostic plots and n_eff
in az.summary
(that’s why sampling with NUTS is a huge advantage: you get diagnostics for free).
Hope this helps
I was thinking of extracting the posterior information of say Model_S2
in order to use it as the prior in the Model_S1
and use the likelihood of observed_data_S1
. Am I going the right way? If it is, how can I extract the posterior information from Model_S2
Mmmh ok, I think this NB could help you go in the right direction then. I’ve never done this personally though
Thank for the link, I think I got rough idea. Another question, if I want to design custom distribution, for instance, noncentral t like in scipy nct. What do I need in order to create a custom distribution with pymc3 and choose priors for it?
I came across new dataset and I checked which distributions fits the new dataset and it turns out to be Cauchy Distribution. I already know that there is a Cauchy function in PYMC3, but, what I am wondering is what type of priors to use where Cauchy is the likelihood distribution?
I looked into wikepedia and understood that scale parameter is greater than 0 so I am thinking halfnorm for the scale/beta parameter. but with location/alpha parameter, is it possible to assume to use norm distribution? and input the values into Cauchy likelihood distribution?