Hi All,
I am still in the process of learning my way through PyMC3 and Bayesian Modelling, and I’m at the point now in my journey where I should be able to build myself some simple models. I tend to learn best when applying myself towards problems I am deeply interested in, and I’ve found myself a topic along with a quality data source to back that up.
I’m looking to model probable speed of Greyhounds over different distances given past results.
To add a bit of spice to the mixture, specifically I’d like to include a factor that considers age in 6 month increments.
So I have 3 years worth of Historical data that includes the dogs age in months, individual finish times for each runner and the distance of the race.
I have tried a Hierarchical approach as below, with the variables defined as:
n_distances = Number of Unique race distances
n_agesex = Combined Age Bins and the Sex of the dog, this variable is unique number of combinations
n_dogs = Number of unique dogs in Dataset
dog_idx = an index number created from dataframe column using pd.factorize to identify individual dogs
agesex_idx = Similar to above except it’s an index number for individual AgeSex labels.
with pm.Model() as Model:
#Shared Hyperpriors
shared_intercept = pm.Normal('shared_intercept', mu=0, sd=1.5)
shared_slope = pm.Normal('shared_slope', mu=0, sd=1.5)
sigma_a = pm.HalfCauchy('sigma_a', 5)
sigma_b = pm.HalfCauchy('sigma_b', 5)
#Offset for distances, attempted non-centered parameterization as per T Wiecki
distance_offset = pm.Normal('distance_offset', mu=shared_intercept, sd=sigma_a, shape=n_distances)
distance_a = pm.Deterministic('distance_a', shared_intercept + distance_offset * sigma_a)
#Offset for Age+Sex, attempted non-centered parameterization as per T Wiecki
agesex_offset = pm.Normal('agesex_offset', mu=shared_intercept, sd=sigma_b, shape=n_agesex)
agesex_a = pm.Deterministic('agesex_a', shared_intercept + agesex_offset * sigma_b)
#Offset for Individual Dog, attempted non-centered parameterization as per T Wiecki
dog_offset = pm.Normal('agesex_offset', mu=shared_intercept, sd=sigma_a, shape=n_dogs)
dog_a = pm.Deterministic('dog_a', shared_intercept + horse_offset * sigma_a)
estimate = pm.Deterministic('estimate', horse_a[dog_idx] + agesex_a[agesex_idx])
y = pm.Normal("y", estimate, observed=horseTime_log)
trace = pm.sample(1000, tune=500, cores=4, return_inferencedata=True, init="advi+adapt_diag")
Now, an expert will look at this model and see right away that it doesn’t work. My novice eyes seem to think it SHOULD work, but i’m failing to see the fault. I think it may be in the way that i’ve structured the model all together. I’ll see if I can boil down my thoughts:
We have a Dog D
D is a Age and Sex
D has competed in r races over d distances
For each r_{d} the dog clocks a Time t
What I would like to do as an end goal, is have a function where I can feed in an ID for the Dog (Name, for example), it’s current Age+Sex Category and a Distance, and return a trace where the model takes into account all previous times for that Dog over the given distance, conditioned by finish times for all Dogs in that Age+Sex category over the given distance
Given D , a and d :
Get all t for D over d = dog
Get all t over d for a = all
Condition a random variable on all Data
Condition a random variable on dog Data and all RV
Return Distribution of Probable outcomes
Hope that makes sense, I’m not math trained so don’t cringe too hard at my trying to use notation!