My misunderstanding with the argument 'observed'

Hi everyone !
I’m trying to make a model in order to work with the 100 most frequent surnames in Spain.
I have the data, which contains what surnames and how often they are repeated. I took only the column of the frecuency of surnames as .csv to recollect it in the variable data:

data = np.loadtxt(‘…/TFG/Frecuency_Surnames.csv’)
az.plot_kde(data, rug=True)

to see how that data behaves (like this):

Captura_frecuencia_surnames

Now, I supose the data corresponds to a discrete distribution, more specifically the Geometric.

The problem is when I’m going to create the model, and I want to pass an argument with the oberserved data (like I saw in other examples) :

with pm.Model() as surnames:

  dist_surnames = pm.Geometric('dist_surnames', p=0.042, observed=data)
  trace = pm.sample(10000, cores=1)

az.plot_posterior(trace)

The program returns an error and I don’t know why.

Maybe am I misunderstanding the concept of the argument observed ?
According to the book Bayesian Analysis with Python by Osvaldo Martin, the observed argument is the way in which we tell PyMC3 that we want to condition for the unknown(the pobability distribution) on the knows (data).

Thankyou !

Please post the error message.

I imagine the problem is that you didn’t leave anything to be sampled. The surnames are observed and the parameter is fixed with p=.042. I imagine you would want to estimate the p, so you should make it a variable

p = pm.Normal(…) # or whatever dist makes sense
Surnames = pm.Geometric(dist_surnames’, p=p, observed=data)

2 Likes