Hi,
I am new to PyMC and Bayesian statistics. I am trying to analyze a dataset that has two columns (for example, ‘index’ and ‘rate’). However, for the same ‘index’, there are multiple rates. I want to analyze the rates associated with a specific index to find a parameter per index. Here is a sample code of what I tried to implement, which gives a value error:
data = pd.DataFrame({
'index': ['a_1', 'a_1', 'a_2', 'a_2', 'b_1', 'b_1', 'b_2', 'b_2'],
'rate': [0.1, 0.3, 0.1, 0.3, 0.1, 0.5, 0.4, 0.5]
})
coords = {'index': data['index'].unique()}
with pm.Model(coords=coords) as model:
alpha_prior = pm.Uniform('alpha_prior', 0, 100,dims='index') # shape parameter 1
beta_prior = pm.Uniform('beta_prior', 0, 100,dims='index') # shape parameter 2
y = pm.Beta('y', alpha=alpha_prior, beta=beta_prior, observed=data['rate'], dims='index')
trace2 = pm.sample()
Later, I tried the same with the ‘shape’ function as given in the code here (the code works):
data['id'] = pd.factorize(data['index'])[0]
with pm.Model() as beta_model:
alpha = pm.Uniform('alpha', 0, 100, shape=4)
beta = pm.Uniform('beta', 0, 100, shape=4)
y = pm.Beta('y', alpha=alpha[data['id']], beta=beta[data['id']], observed=data['rate'])
trace = pm.sample()
I am confused about whether PyMC knows which data points relate to which index and analyzes accordingly, or if it tries to match them to the number of dimensions specified in the code.
How can the previous code be altered to work in the same manner as the second one? Or would it be better to create separate models for each index?