Prediction using sampling

Nipun · September 16, 2019, 1:51pm

Hey!
I was trying to predict the returns of an algorithmic trading strategy on out of sample data using StudentT distribution as done in the library pyfolio which uses pymc3 for bayesian prediction. It seems to me that some parts in the prediction part of the code in pyfolio are a bit outdated though. I just wanted to know how prediction using sampling can be done or specifically what I am doing wrong in the current methodology as the predictions seem to be a little off, the current method I am using is as follows:

with pm.Model() as model:

    mu = pm.Normal('mean returns', mu=0, sd=.01, testval=data.mean())

    sigma = pm.HalfCauchy('volatility', beta=1, testval=data.std())

    nu = pm.Exponential('nu_minus_two', 1. / 10., testval=3.)

    returns = pm.StudentT('returns', nu=nu + 2, mu=mu, sd=sigma,
                          observed=data)
    pm.Deterministic('annual volatility',
                     returns.distribution.variance**.5 * np.sqrt(252))

    pm.Deterministic('sharpe', returns.distribution.mean /
                     returns.distribution.variance**.5 *
                     np.sqrt(252))

    trace = pm.sample(samples, progressbar=progressbar)
return model, trace

The above, from what I understand samples the hyperparameters and ‘fits’ them using the observed data. I then use the model and trace returned from here as follows:

with model:    
    returns_test = pm.StudentT('returns_test', nu=nu+2, mu=mu, sd=sigma, shape=(len(returns_test), 1))
    ppc_samples = pm.sample_posterior_predictive(trace, samples=samples,
                                model=model, var_names=['returns_test'],
                                progressbar=progressbar)
    return trace, ppc_samples['returns_test']

So basically I use the trace object (samples of hyperparameters tuned onto the training set) to predict on the test set, here I am using the assumption that the returns would be identically distributed for both in sample and OOS, which is reasonably correct for the purpose of analyzing a strategy’s performance. These ppc samples are then used to generated percentile cones to analyze which percentile zone the strategy lies in, the problem right now is that the cones generated using the above methodology are far too ‘conservative’ and seem to predict a percentile range of 25-50 for a strategy which is performing reasonably worse than in sample performance (lies in the 5th percentile and below using monte carlo simulations, which seems to be more realistic considering it’s performance).

Am I using the correct method for prediction? If yes, can anyone help me with why the ‘predictions’ are coming to be so conservative, and if not, what is the correct method for performing predictions using sampling?

NOTE: The returns are a pandas series of day wise returns of the strategy (NOT in percentages), which I normalize (Z score normalization) before I do the trace generation and sampling, and ‘de-normalize’ after the ppc samples are generated.

P.S sorry for the long question, I just wanted to be as clear as possible in trying to convey what problem I was solving and the challenge I was facing in doing so. Any help would be appreciated.
Thanks in advance!

Topic		Replies	Views
Are samples from the trace equivalent to samples from pm.sample_posterior_predictive? Questions	6	625	October 16, 2020
How to make out-of-sample predictions with pymc model v5	1	635	February 8, 2023
Example for out-of-sample prediction with posterior predictive sampling v5	8	3035	October 28, 2022
Issues when trying to do out of sample prediction v5	2	124	April 9, 2024
Could somebody provide a minimal example for sample_posterior_predictive() Questions	2	408	April 15, 2021

Prediction using sampling

Related topics