I’ve been crazy last weeks trying to implement this example to PyMC3 and I’ve decided to tell you on the discourse in order for sameone to help me, if only with a few notes.
I have to reach to know the posterior (which is an continous distribution)-> P(age|specific_name) in order to know later the probabilities of for example, P(age=70|specific_name).
In my problem, I’m going to spicify that my specific_name is ‘Evian’, a name which is more likely to find people with that name much older than young people.
By knowing the Bayes’ Theorem:
I know that the parameters of my problem are:
theta -> age -> which represents the probability of an Evian’s age range.
y -> specific_name (‘Evian’ in this case) -> which represents the number of people called Evian for every ‘N’ english people.
so p(thetha) will be my prior.
p(y|theta) will be my likelihood.
Knowing that ‘Evian’ is a rare name in all of England and, as I said before, it’s easier to find older people with that name, I have to create that model in PyMC. And also knowing that the age parameter is going between 1 and 101 years old.
p(theta) (my prior) I supose to know that is like a TruncatedNormal:
theta = pm.TruncatedNormal(‘theta’, mu=85, sigma=15, lower=0.0,
but how I have to deal with the likelihood ? Am I right with my argument of the problem ?
Thank you for your time and patience
Well, the likelihood P(y | \theta) asks something like “what is the probability of observing the name ‘Evian’ given we know the person is, say, 50 years old”. This can simply be modeled as a
Poisson distribution with
rate parameter assumed to be
theta. You can then pass in an observed array of ages (of people with a name, say, ‘Evian’) and define a model as follows:
import pymc4 as pm
import numpy as np
with pm.Model() as model:
theta = pm.TruncatedNormal('theta', mu=85, sigma=15, lower=0.0, upper=101.0)
likelihood = pm.Poisson('likelihood', mu=theta, observed=<your_obderved_values>)
You can play around with other distributions that you think would better fit your data. I hope this helps!
Thankyou so much for your reply, it really helps me !
Another question relating to this, but more complex:
If for example I have to give now two parameters (‘age’ and ‘names’, not just the ‘age’ as I wrote in my example above) to the model.
I have to create two pymc variables, two priors ? and then pass those prior to my new likelihood whose distribution could be the Normal distribution (in order to specify ‘mu’ as ‘age’ and ‘sd’ as ‘names’) and then ask to my model to give the logp of for example P(age = 89 | name = Evian) or P (name = John | age = 20).
This is the table I did with the probabilities I want to obtain (obviosly, an approximation) if I ask pymc to give me the probability of each name and each age:
How can I do that ? Is it correctly if I use the Normal Distribution as likelihood to this example (taking into account the table)? How can I ask to the program for the logp of that example P(age = 89 | name = Evian) ?
Thanks for you attention !