I’m not a statistician nor am I a programmer but I try to learn about both outside of my professional work. I am interested in building a model with pymc3 that combines PCA with Bayesian Inference but I’m not exactly sure how to do it properly. I know PCA is used for putting together multiple variables to produce the principal components and with Bayesian I need to assume the priors which are the distributions of certain components and then use them to build a function that will provide a distribution that hopefully doesn’t deviate too much from the actual data. That’s how I understand it. But I am confused when it comes to the specifics.
For example, say I want to build a model to predict GDP and I have three economic time series that I combine together with PCA to produce the principal components (PC). I assume now that I have to break apart the equation in producing the PC and set priors for that. So I put my three economic indicators together into a dataframe called ‘df’ and use the PCA function from sklearn to get the ‘principal_components’ and with some back engineering I learn that the factors (the weights) are used with the ‘df’ to produce the principal components. So here is what I build:
with pm.Model() as pca_model: #define priors /parameters factors = pm.Normal('factors', mu=0, sigma=10) #define likelihood equation = (factors * df).sum(axis=1) likelihood = pm.Normal('y', mu=equation, sd=1, observed=principal_components) #posterior trace = pm.sample(1000)
I’m hoping someone can point to where I’m wrong because I’m sure I am misunderstanding something.