What kind of distribution is suited for a phase space

HJAM · August 24, 2022, 7:26pm

Hi,

Let’s assume that you play golf on an uneven-green. There are multiple trajectories that can result in successful putt.
A very fast putt aimed directly at the hole would reach the hole, but could be too fast to fall into. On the other hand, if the ball is aimed at too high an angle the ball will come to a halt outside the reach of the hole, regardless of the speed. There could be also multiple slopes on the green, resulting in a ball swirling left to right and back.

It is possible to display the angle- and speed choice of the golf player as a so-called phase space.

6	-0.75	-0.5	-0.25
5.75	0	0	0
5.5	0	0	0
5.25	0	0	0
5	0	0	0
4.75	0	0	0
4.5	0	0	0
4.25	0	0	0
4	0	0	0
3.75	0	0	1
3.5	0	1	1
3.25	0	1	1
3	1	1	1
2.75	1	1	1
2.5	1	1	0
2.25	1	0	0
2	1	0	0
1.75	0	0	0
1.5	0	0	0
1.25	0	0	0
1	0	0	0
0.75	0	0	0
0.5	0	0	0
0.25	0	0	0
0	0	0	0

On the x-axis, you see the angle where 0 is straight at the hole. On the y-axis, you see the (launch) speed of the ball in m/s . As you can see there are multiple combinations of speed and angle that result in a successful put.
I assume that the golfer tries to shoot the ball into the hole. However, I don’t know which angle-speed-combination he actually targeted. In addition, I also don’t know which combo was actually realised. It could be targeted combo, it could also be the targeted combo ± a random error. I am hopeful that with all the possibilities of bayesian inference the mystery can be solved.

I need to make v_target and angle_target latent parameters. And if we make the increments small enough the phase space represents the real world. But what is a fitting probability density function? Please advice how you would continue

Thank you

junpenglao · August 25, 2022, 5:19am

You can take a look at Model building and expansion for golf putting — PyMC example gallery

HJAM · August 25, 2022, 5:34am

Thanks for the suggestion. I did that, but they assume a flat green. Because my assumed method has many slopes, the geometry angle doesn’t work. I only known speed/angle combinations that result in a successful putt.

jessegrabowski · August 25, 2022, 6:14am

You could extend the model @junpenglao linked by embedding a PDE that describes the motion of the ball over the green, which I assume you already have because you’re talking about phase spaces. Then you could model the probability of success in the same geometric way, but given 1) the initial angle/force conditions, 2) the terrain, and 3) the stopping position of the ball. You could look into some of the PDE examples in the gallery for ideas.

But, sticking with the phase space as provided, you could describe all the successes as coming from a multivariate normal in angle-speed space, then the probability that an (angle, speed) 2-tuple came from that distribution would give you (a feature with with to model?) the probability that the tuple will sink a putt. Here’s what I came up with:


## Load in your table as df
# Convert the table to a long dataframe with 25 * 21 rows
df_long = (df.unstack()
               .reset_index()
               .rename(columns={0:'success'})
               .applymap(float))

coords = {'vars':['speed', 'angle']}
with pm.Model(coords=coords) as model:
    X = pm.Data('X', df_long[['speed', 'angle']], mutable=True)
    y = pm.Data('y', df_long.success, mutable=True)

    X_success = pm.Data('X_success', df_long.loc[df_long.success == 1, ['speed', 'angle']], mutable=True)
    
    mu = pm.Normal('mu', mu=0, sigma=1, size=2)
    
    sd_dist = pm.HalfStudentT.dist(sigma=1, nu=3)
    chol, *_ = pm.LKJCholeskyCov('chol', eta=1, n=2, sd_dist=sd_dist)
    
    # Model only the successful putts as a multivariate normal
    phase_dist = pm.MvNormal('phase_state', mu=mu, chol=chol, observed=X_success)
    
    # Get the log probability associated with all data, success and fail, fit Bernoulli
    logits = pm.logp(phase_dist, X)
    success = pm.Bernoulli('success', logit_p=logits, observed=y)

Here’s the success distribution the model comes up with (data in red):

And here’s the probability of making the putt. Looks like it’s overly pessimistic in general – being in the “bliss zone” only barely puts success over 50%. You might be able to do more with the Bernoulli logits in the model to fix that. Hopefully this should be enough to get you started, though.

HJAM · August 25, 2022, 10:33am

Thank you very much, Jesse! I am going to absorb this all and follow your suggestions!
I was thinking about the multivariate skew-normal distribution in particular, so I am happy that you mention the multivariate distribution.
However, what I find difficult to understand is the density of the data. In theory, the outcomes are binary, so the density should be flat right? Why is state {angle: -0.5, speed: 3} darkblue whereas {angle: 0, speed: 4} light blue? Both result in a successful putt? Or am I missing something crucial?

jessegrabowski · August 25, 2022, 11:01am

There are two spaces you need to flip between. In the “observation space”, the outcomes are binary, yes. For any specific putt, it only succeeds or fails.

At the same time, this model proposes that putts come from an unobserved “angle-speed space”. In that space, each angle-speed 2-tuple has an associated probability of being associated with a successful putt. So, before the golfer putts, the model proposes that putts which come from a region around state {angle: -0.5, speed:3} have a higher probability of being successful than those that come from the region around {angle:0, speed:4}.

Another way to think about it would be like a 2D guassian convolution (photoshop blend tool?) over the phase space table. Since the table is discrete and we want continuous values, we can interpolate with a gaussian kernel in every direction. In-between the successful putts (the dark region), we’re confident these should also be successful puts. But do we really believe in the sharp boundaries in the table? The gaussian accounts for this lack of confidence by putting low but non-zero probability in regions like the {0, 4} neighborhood.

So both {-0.5, 3} and {0, 4} could result in a successful putt, but the model is much more confident about {-0.5, 3}.

HJAM · August 25, 2022, 11:13am

Gotcha, thanks for the explanation. I understand it more clearly now!

Topic		Replies	Views
GP Modelling Advice Questions	1	355	May 5, 2021
Golf putting example (isn't it arctan?) version agnostic	6	205	May 15, 2024
Modeling tennis at the shot level? Questions	0	325	October 8, 2020
Tournament Skill Estimator, some modelling challenges Questions	7	1285	February 15, 2019
Dungeons & Dragons Dice problem (Allen Downey) Questions	1	542	August 5, 2020

What kind of distribution is suited for a phase space

Related topics