Multivariate version of Rolling Regression Example

clgarciga · January 25, 2023, 5:04pm

Hello,

I’m trying to replicate the following from the Rolling Regression example ( Rolling Regression — PyMC example gallery):

# from pandas_datareader import data
# prices = data.GoogleDailyReader(symbols=['GLD', 'GFI'], end='2014-8-1').read().loc['Open', :, :]
try:
    prices = pd.read_csv(os.path.join("..", "data", "stock_prices.csv")).dropna()
except FileNotFoundError:
    prices = pd.read_csv(pm.get_data("stock_prices.csv")).dropna()

prices["Date"] = pd.DatetimeIndex(prices["Date"])
prices = prices.set_index("Date")
prices_zscored = (prices - prices.mean()) / prices.std()
prices.head()

with pm.Model(coords={"time": prices.index.values}) as model_randomwalk:
    # std of random walk
    sigma_alpha = pm.Exponential("sigma_alpha", 50.0)
    sigma_beta = pm.Exponential("sigma_beta", 50.0)

    alpha = pm.GaussianRandomWalk(
        "alpha", sigma=sigma_alpha, init_dist=pm.Normal.dist(0, 10), dims="time"
    )
    beta = pm.GaussianRandomWalk(
        "beta", sigma=sigma_beta, init_dist=pm.Normal.dist(0, 10), dims="time"
    )

    # Define regression
    regression = alpha + beta * prices_zscored.GFI.values

    # Assume prices are Normally distributed, the mean comes from the regression.
    sd = pm.HalfNormal("sd", sigma=0.1)
    likelihood = pm.Normal("y", mu=regression, sigma=sd, observed=prices_zscored.GLD.to_numpy())

    trace_rw = pm.sample(tune=2000, target_accept=0.9)

as follows:

yobs = prices['GLD'].values
T = len(yobs)

data = prices.copy().drop(columns='GLD').assign(Constant = 1)
p = 2

with pm.Model(coords={"time": prices.index.values, 'xvars':data.columns.values}) as model_randomwalk:
    X = pm.MutableData("X",data.values,dims=("time","xvars"))
    # std of random walk
    beta_sigma = pm.Exponential("beta_sigma", 50.0,dims='xvars')
    beta = pm.MvGaussianRandomWalk("beta", 
                                    mu=np.zeros(p),
                                    dims=("time","xvars"),
                                    cov=pytensor.tensor.diag(beta_sigma),
                                    init_dist=pm.MvNormal.dist(0, 10*np.eye(p)))

    # Define regression
    regression = pm.math.sum(X*beta,axis=1)

    # Assume prices are Normally distributed, the mean comes from the regression.
    sd = pm.HalfNormal("sd", sigma=0.1)
    likelihood = pm.Normal("y", mu=regression, sigma=sd, observed=yobs)

    trace = pm.sample(2000,tune=1000, target_accept=0.9)

My thinking is beta should be T x 2, with the first column equal to beta in the original example and the second column equal to alpha, and then I can multiply X and beta elementwise and then sum the two elements in each row to get the regression mean.

This models samples fine (though much more slowly) and also produces the same posterior predictive as the Rolling regression example, and yet the values for alpha and beta don’t look as “clean” as they do in the Rolling Regression example and also their scale is very off; e.g. the betas are between 3 and 9 and the alphas reach almost into the 100s.

Is there something obvious that I’m doing wrong? Thank you in advance for any suggestions.

ricardoV94 · January 25, 2023, 10:17pm

If you are using pymc>4, the time dimension of the MvGaussianRandomWalk (as well as the univariate) has to be the rightmost dimension

clgarciga · January 25, 2023, 11:33pm

Thanks! So I realized I had a typo in my code and should have been referencing prices_zscored and not prices. When I change that, model still takes a long time to run, but I get back alpha and beta as in the Rolling Regression example.

I am using pymc version 5.0.1 so your comment applies. However, if I try and switch the dimensions on beta as below, then every beta returned ends up being 2x2. Am I still doing something wrong?

yobs = prices_zscored[['GLD']].values
T = len(yobs)

data = prices_zscored.copy().drop(columns='GLD').assign(Constant = 1)
p = 2

with pm.Model(coords={"time": data.index.values, 'xvars':data.columns.values}) as model_randomwalk:
    X = pm.MutableData("X",data.values,dims=("time","xvars"))
    # std of random walk
    beta_sigma = pm.Exponential("beta_sigma", 50.0,dims='xvars')
    beta = pm.MvGaussianRandomWalk("beta", 
                                    mu=np.zeros(p),
                                    dims=("xvars","time"),
                                    cov=pytensor.tensor.diag(beta_sigma),
                                    init_dist=pm.MvNormal.dist(0, 10*np.eye(p)))
....

ricardoV94 · January 26, 2023, 12:13am

Sorry, my bad, the MvGaussianRandomWalk has shape (batch dimensions x time x core dimension) in this order. So your original order was correct.

clgarciga · January 26, 2023, 4:57pm

No worries!

Topic		Replies	Views
Rolling regression with multivariate stored in a pandas dataframe v5 modeling	0	621	September 28, 2022
Multivariate Random Walk Regression v5	3	339	December 8, 2023
Wrong posterior in multidimensional GaussianRandomWalk version agnostic	11	547	January 9, 2023
[PyMC3] Non-centered parameterization for rolling regression Questions	2	683	January 3, 2020
Rolling Regression with Time Blocks Questions	1	508	April 22, 2018

Multivariate version of Rolling Regression Example

Related topics