Thanks for your message, it was helpful.
So if I have a df of three columns (income, employment, consumption) k should have a value of 3 in the following equation?
#define priors /parameters
factors = pm.Normal('factors', mu=0, sigma=10, size=(k,))
#define likelihood
equation = df @ factors # (n, k) (k, ) -> (n,)
And yes I am trying to combine economic indicators into a single index in order to predict GDP. I assumed that since these three indicators are put together to create the principal components which is the index that the principal components data should be the observed data.
Ideally, the final goal would be to develop a model that recalibrates every month to select around 20 economic time series out of a total of 100 depending on which series have the most explanatory power towards GDP. Apply weights to these 20 series depending again on explanatory power. And then use Bayesian inference to come up with probabilities on what GDP should be based on what the 20 most relevant time series are showing. This led me to try and combine PCA with Bayesian.
I have been working with sklearn’s PCA object by creating a df with my economic indicators (excluding GDP) to produce a single principal component that has an adjusted R squared with GDP in the 60s. But I think Bayesian could be added to improve the model.