Statistical Rethinking Ch 3 Q1 Code / observed = data errors / beginner question

bobthebuilder · June 2, 2022, 3:40am

I’m a beginner and am working my way through rethinking. I have an issue where I am trying to include data in my models to inform the parameters by using the observed keyword argument but get what appear to be incorrect parameter estimates. I’ve also tried making hierarchical normal models previous to this and have experienced similar issues. If anyone has any tips they would be appreciated. Thanks!

Also I realized that the data got cut out of the screenshot so the previous line of code is:

data = np.array([0,0,0,1,1,1,1,1,1]) #supposed to be binomial data: 6 successes and 9 trials

Edit: For a bit more context, the official solution for the problem is the first problem in this file: pymc-resources/Chapter_3.ipynb at main · pymc-devs/pymc-resources · GitHub

Edit 2: also changed the list to a numpy array

jessegrabowski · June 2, 2022, 4:15am

If you look carefully at the code in the notebook you linked, they use a Bernoulli distribution, whereas you are using a binomial distribution.

So you’re asking a different question than the one in the example, and the data take on a different interpretation. A Bernoulli distribution gives the probability of observing a success for a single experiment, like flipping a coin. Given this question and your data, it looks like the probability is something like 2/3 (6 successes and 9 trials).

With a binomial, the data are now the number of successes, out of n trials. You are saying, “each of these data are the number of successes out of n trials, what is the probability of success for a single flip?” Given this question, looking at your data, it’s something just below 10%, which the model correctly finds. In this setting, the data are 0/9, 0/9, 0/9, 1/9, 1/9, 1/9, 1/9, 1/9, 1/9.

So change the distribution and you’ll get the same answer as the notebook, but also realize that likelihood functions are ways to ask your data different kinds of questions!

cluhmann · June 2, 2022, 4:22am

Welcome!

By specifying n=len(data), you are implying that you are considering a scenario in which there are 9 “flips” or attempts. You then provide an array of observed data:

observed=[0,0,0,1,1,1,1,1,1]

But this suggests that the first time you “flipped the coin” 9 times, it came up “head” 0 times. The second time you “flipped the coin” 9 times, it came up “head” 0 times. Etc. If you were instead looking to represent a scenario in which you “flipped the coin” 9 times and it came up heads 6 times, then you would do something like this:

y = pm.Binomial("y", p=prior, n=10, observed=[6])

Alternatively, you can leave the dichotomous data in its natural state and adopt a Bernoulli likelihood:

y = pm.Bernoulli("y", p=prior, observed=[0,0,0,1,1,1,1,1,1])

bobthebuilder · June 2, 2022, 4:30am

Thank you, your solution does fix the problem! When I see a lot of Bernoulli trials my mind immediately goes to the binomial distribution so I like your explanation of “what question am I asking the data” and am going to have to get in the mindset of being granular about that. Also good explanation.

bobthebuilder · June 2, 2022, 4:35am

Hi! Thank you as well and this answer is helpful and both solutions do fix the error similar to Jesse’s response. Since you’re a member of the community team I’d like to ask about the documentation if you’re knowledgeable on it. I was looking at it for pm.Binomial and such and couldn’t figure out all the arguments and keyword arguments the method would take since it just says the general *args and **kwargs, which is how I ended up on this forum. Do you know where I would look for more details on things that don’t have readily findable information

cluhmann · June 2, 2022, 4:57am

It’s a bit hidden, because it’s in the documentation for the parent class, pm.Distribution, and even there there isn’t much in the way of a description. As you may be able to tell, the documentation is in the middle of a major revision in preparation for the release of version 4. But if that page or other aspects of the documentation aren’t what you need, feel free to post questions here and someone should be able to get you headed in the right direction!

[Edit: check out the v3 doc for a bit more info.]
[Edit 2: or check out this docstring.]

bobthebuilder · June 2, 2022, 5:20am

Thank you! I’ll probably come here with my questions more often then!

Topic		Replies	Views
Do I have to use pm.Bernoulli or pm.Binomial?	6	1155	May 29, 2023
The results of a switch are observed directly (beginner) Questions	1	910	November 11, 2018
Sampler issues on Beta prior Binomial likelihood v5 bug	5	393	December 8, 2022
Specifying hierarchical Binomial model when individuals have varying numbers of trials? Questions	6	877	September 23, 2019
Model not reproducing observed data? Questions	9	1116	July 25, 2018

Statistical Rethinking Ch 3 Q1 Code / observed = data errors / beginner question

Related topics