Partsworth estimation causes "bad energy" warning on some observations

Hello, everyone,

I am pretty new to data analysis in python and the code is still “work in progress” which is why I apologize for the mess in advance.

With the following code I try to estimate the partsworth of the participants in a conjoint task.

Most of the code is from Cam Davidson-Pilon (which is free to use). The idea is to implement new things after it runs through smoothly the first time.

This is the model:

with pm.Model() as HB_v3:
    alpha = pm.Normal('alpha', 0,
    partsworth = pm.MvNormal("partsworth", alpha,
                             tau=np.eye(n_attributes_in_model), shape=(n_participants, n_attributes_in_model))
    observations = []
    for i, (_, selection) in enumerate(selections.iteritems()):
        to_be_stacked = []
        tempPartworth = partsworth[i, :]
        nan_mask = pd.notnull(selection)
        for _, choice in choices.get_group(CJ_Version[i]).iloc[:,:-1].groupby(axis=1, level=0):
            dotProduct =[nan_mask.values].values, tempPartworth)
        stack = tt.stack(to_be_stacked, axis = 0).T
        softmax = tt.nnet.softmax(stack)
        temp = pm.Categorical("Obs_%s" %,


The problem is that the code doesn’t run for all participants. NUTS gives me the error message “bad initialize energy -inf” on startup.

I already tried the tips from this post:

and also the notes from the “diagnosis page”:

and found with the following code out that some of the observed variables (but not all) get a logp of “-inf”:

for RV in HB_v3.basic_RVs:
    tp = HB_v3.test_point
    print("** ", )
    print("lenght: ", len(RV.tag.test_value))

(I am going to trim the output to the relevant observed variables)

Partsworth (marked in blue):

I then played with the “testval” parameter, which only led to all participants becoming “-inf” at some point.

Next I tried to switch off the “jitter” or sample with Metropolis. Both without success (Metropolis runs through for all participants, but afterwards all partsworth get the same value for all participants over all tasks).

I do not have a clue, why NUTS can sample some participants and others not.
Does anybody have an idea?

Thanks in advance!


Hi Daniel! It’s difficult to be categorical (pun intended) without the data, but just to check:

  • Are you sure there are no Nans in your observations (the stuff you give to observed in pm.Categorical)?
  • Are you sure your observations are all integers? Otherwise, I think Categorical will complain
  • If your observations are aggregated (i.e multinomial), are there zeros in it? Same here: Multinomial won’t like it
  • Finally, maybe try:
pm.Multinomial(f"", n=1, p=softmax,

instead of pm.Categorical(bla bla bla). I remember Categorical was misbehaving – it is fixed now but I don’t remember if it’s already released or still on master.

Hey Alex,

Thank you for your quick reply! I really appreciate it!

To give you an idea of the data, here is a small part:

in the rows are the different tasks, and in the columns the answer of each participant to each task (15 Tasks in total).

There are no “zeros” in the dataset, but Pandas returns me “objects” as dtypes:

but visually the corresponding (& troublesome) participant (530) seems to be an integer.

(I also tried to send the “selection” as int

observed = selection[na_mask.values].astype(int).values)

to the sampler. Did not work either.

The Answers of the participants are not aggregated (the number indicates which product they have selected in the task, e.g. 1 = product one; 5 = product five, and so on).

I tried your “pm.Multinomial” suggestion with:

temp = pm.Multinomial("Obs_%s" %,
                              n = finalSelection.shape[0],
                              p = softmax,

but it raises another error message during compilation:

The shape[1] = 20 can only be the alternative choices within the task (I am showing twenty product configurations within one “shelf”. The participant has to choose one out of the 20 alternatives).

When I am deleting the “.T” behind

stack = tt.stack(to_be_stacked, axis = 0)

it surprisingly compiles, but all participants are “-inf”.

Any ideas?

Offtopic question:
Is there somewhere a good overview and explanation of the different probability distributions? If I understood it correctly, choosing the appropriate distribution is key in bayesian inference statistics. In psychology on the other hand you normally only get to know & use the gaussian normal distribution…

Thank you for your help,

Ooh ok! Then I think there are two problems:

You have to convert your columns to integers then, otherwise they are treated as strings – which is what “object” means in Pandas – and sampling won’t work.

Ow ok that’s important information! Then if you wanna use Categorical (or Multinomial, that’s the same), you have to restructure your data: your categories are your product numbers, not your different tasks (the C1_x). In other words, your dataframe needs to have the product numbers as row indices, not the tasks.
And each cell would contain the number of times Individual_X chose product_y, out of 15 trials (i.e 15 different tasks). That is exactly the case where you need a Multinomial likelihood (not Categorical as the data would not be one-hot encoded).
Phew, I hope that’s not too messy an explanation :stuck_out_tongue_winking_eye:

For short presentations, your have the PyMC3 API.
Usually, the Wikipedia page for any given distribution is quite good also.
And if you really wanna train yourself to Bayesian inference, I listed very good resources here.

Hope it’s useful, and good luck!

Hey, Alex,

once again thank you for your prompt reply and excuse my late answer :slight_smile:

I tried to send the selection as “int” into the syntax, but it didn’t work either.
If you ask me, I’d guess you’re probably right about the format of my answers (–> the C1_x).
The problem is that there are “zeros” after the transformation. If I understood you correctly, categorical and multinomal sampling methods do not work if there are zeros in the observed dataset. So this won’t be an option.

I’m currently trying to rewrite the code to a more classic version of the conjoint estimate (each alternative is a line where the answer is encoded in 0/1; sampling with Bernoulli), but the tensors give me a headache ;-).

I´ll keep you updated :slight_smile: (… and would ask you again if I am not able to solve the tensor issue :stuck_out_tongue: )


Hi Daniel! Some precisions:

I think I wasn’t clear enough: multinomial won’t sample (at least not out of the box, you can adapt it if needed) if you have some categories with a probability of 0, but I don’t think it’s your case. Categorical will of course sample, as a Categorical is just a generalization of Bernoulli to more than two categories.

Did you do it with pandas? It should be straightforward: df.astype(int). More generally, are you doing your data cleaning in pandas? Because it’s weird that you’re doing it with the tensors – i.e with theano.

Hey, Alex,

please excuse my disappearance.

I changed the complete cleaning process. Now it seems to work :slight_smile:

Next month I get new data, which I use to test how well my attempt with Pymc3 works :slight_smile:

However, I still have two questions, which were also not quite clear to me yet.

Is it possible to slice the trace plots somehow? I usually have a lot of partworths/beta weights and matplotlib doesn’t want to display them all at the same time…
Is it possible to output the traceplots with another visualization framework (e.g. plotly)?

To make a prediction with my model I just have to define it as theano.shared, correct?
For example, would my choices have to be created as a shared variable?

best regards

Hi Dan, happy to hear you managed to make it work! Hope the predictions will make sense :wink:
Regarding your questions:

  1. Not sure I undestood what you mean by “slicing”, but you can give ArviZ coords argument to plot_trace to select only some parameters in the trace – note that this takes xarrays in the background, which can take a little time to get used to. What I do in this case is az.from_pymc3(my_trace) to understand the structure of the different chains and then use coords.
    Easier and useful could also be to just pass compact=True to az.plot_trace – this will plot your multidimensional parameters (e.g a 6-dimensional beta) into the same plot, allowing matplotlib to diplay it. You can also choose only the parameters you’re interested in with the vars argument.
    Just FYI, ArviZ developers are working on integrating Bokeh as backend for plotting, in addition to matplotlib… Stay tuned :wink:

  2. Exactly. More precisely, you have the pm.Data class now, which is neat and easy to use – see this example NB.

Hope this helps, and good luck with your model :vulcan_salute: