Thank you in advance for reading my post; I’m very much a beginner and a bit lost.
My ultimate goal is to predict the “number of fails” for a given player and game difficulty.
Currently, my model has no predictors, so there is nothing for me to adjust within pm.set_data().
I think the heart of my issue is that I’m not sure how to specify a model with multiple predictors and multiple dimensions.
This is representative of my data and coordinates:
# import libraries
import numpy as np
import pandas as pd
import pymc as pm
import arviz as az
RANDOM_SEED = 100
np.random.seed(RANDOM_SEED)
az.style.use("arviz-darkgrid")
# define data
player_list = ['Player1', 'Player1', 'Player1', 'Player1', 'Player2', 'Player2', 'Player2', 'Player2']
difficulty_list = ['Easy', 'Easy', 'Hard', 'Hard', 'Easy', 'Easy', 'Hard', 'Hard']
fails = [3, 2, 4, 4, 4, 4, 5, 4]
df = pd.DataFrame({'Players': player_list, 'Difficulty': difficulty_list, 'Fails': fails})
# create coordinates
player_factor, player_names = pd.factorize(df['Players'], sort=True)
diff_factor, diff_categ = pd.factorize(df['Difficulty'], sort=True)
coords = {
"obs": df.index.values,
"player_names": player_names,
"diff_categ": diff_categ
}
My current model uses a player dimension and a difficulty dimension:
with pm.Model(coords=coords) as m1:
# using pm.data
y = pm.MutableData("y", df['Fails'].to_numpy(), dims="obs")
# Names
NamesΘα = pm.Gamma("NamesΘα", alpha=3, beta=3, dims="player_names")
NamesΘβ = pm.Gamma("NamesΘβ", alpha=3, beta=3, dims="player_names")
# Difficulty
DiffΘα = pm.Gamma("DiffΘα", alpha=3, beta=3, dims="diff_categ")
DiffΘβ = pm.Gamma("DiffΘβ", alpha=3, beta=3, dims="diff_categ")
# likelihood
Fails = pm.BetaBinomial(
"Fails",
n=6,
alpha=NamesΘα[player_factor] + DiffΘα[diff_factor],
beta=NamesΘβ[player_factor] + DiffΘβ[diff_factor],
observed=y,
dims="obs"
)
I tried changing the likelihood to include specific predictors for player name, difficulty, but it throws an error about shapes (“Input dimension mismatch. One other input has shape[0] = 2, but input[1].shape[0] = 8.”):
with pm.Model(coords=coords) as m2:
# using pm.data
x1 = pm.MutableData("x1", player_factor, dims="player_names")
x2 = pm.MutableData("x2", diff_factor, dims="diff_categ")
y = pm.MutableData("y", df['Fails'].to_numpy(), dims="obs")
# Names
NamesΘα = pm.Gamma("NamesΘα", alpha=3, beta=3, dims="player_names")
NamesΘβ = pm.Gamma("NamesΘβ", alpha=3, beta=3, dims="player_names")
# Difficulty
DiffΘα = pm.Gamma("DiffΘα", alpha=3, beta=3, dims="diff_categ")
DiffΘβ = pm.Gamma("DiffΘβ", alpha=3, beta=3, dims="diff_categ")
# likelihood
Fails = pm.BetaBinomial(
"Fails",
n=6,
alpha=NamesΘα * x1 + DiffΘα * x2,
beta=NamesΘβ * x1 + DiffΘβ * x2,
observed=y,
dims="obs"
)
To summarize:
- I want to generate predictions ( either of ‘Fails’ or probability of ‘Fails’ taking a specific value )
- To accomplish (1), my model requires predictors
- Not sure how to accomplish (2) and still specify different dimensions for my priors.
Any help/crictism is welcome!