Factor Analysis

alphamaximus · August 4, 2017, 9:04pm

Hello,

I’m interested in a Bayesian version of Factor Analysis.

Has anyone seen examples of this?

Thanks!

junpenglao · August 5, 2017, 7:58am

We have a probabilistic factor analysis example in the doc:

github.com

pymc-devs/pymc3/blob/master/docs/source/notebooks/probabilistic_matrix_factorization.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Probabilistic Matrix Factorization for Making Personalized Recommendations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model discussed in this analysis was developed by Ruslan Salakhutdinov and Andriy Mnih. All of the code and supporting text, when not referenced, is the original work of [Mack Sweeney](https://www.linkedin.com/in/macksweeney).\n",
    "\n",
    "## Motivation\n",
    "\n",
    "Say I download a handbook of a hundred jokes, and I'd like to know very quickly which ones will be my favorite. So maybe I read a few, I laugh, I read a few more, I stop laughing, and I indicate on a scale of -10 to 10 how funny I thought each joke was. Maybe I do this for 5 jokes out of the 100. Now I go to the back of the book, and there's a little program included for calculating my preferences for all the other jokes. I enter in my preference numbers and shazam! The program spits out a list of all 100 jokes, sorted in the order I'll like them. That certainly would be nice. Today we'll write a program that does exactly this.\n",
    "\n",
    "We'll start out by getting some intuition for how our model will work. Then we'll formalize our intuition. Afterwards, we'll examine the dataset we are going to use. Once we have some notion of what our data looks like, we'll define some baseline methods for predicting preferences for jokes. Following that, we'll look at Probabilistic Matrix Factorization (PMF), which is a more sophisticated Bayesian method for predicting preferences. Having detailed the PMF model, we'll use PyMC3 for MAP estimation and MCMC inference. Finally, we'll compare the results obtained with PMF to those obtained from our baseline methods and discuss the outcome.\n",

This file has been truncated. show original

alphamaximus · August 7, 2017, 3:50am

Thanks. I’ve seen this before, but it didn’t connect with me that this is factor analysis.

I’m going through the notebook, and in the __init__ of the PMF object, it reads

# Perform mean value imputation
nan_mask = np.isnan(self.data)
self.data[nan_mask] = self.data[~nan_mask].mean()

I understand why it’s here, but it doesn’t seem necessary, nor very Bayesian.

So I commented it out and I got this error:

ValueError Traceback (most recent call last)
in ()
1 # Find MAP for PMF.
----> 2 pmf.find_map()

in _find_map(self)
52 with self.model:
53 logging.info(‘finding PMF MAP using Powell optimization…’)
—> 54 self._map = pm.find_MAP(fmin=sp.optimize.fmin_powell, disp=True)
55
56 elapsed = int(time.time() - tstart)

~/anaconda/lib/python3.6/site-packages/pymc3-3.1-py3.6.egg/pymc3/tuning/starting.py in find_MAP(start, vars, fmin, return_raw, model, live_disp, callback, *args, **kwargs)
166 "density. 2) your distribution logp’s are " +
167 “properly specified. Specific issues: \n” +
→ 168 specific_errors)
169 mx = {v.name: mx[v.name].astype(v.dtype) for v in model.vars}
170

ValueError: Optimization error: max, logp or dlogp at max have non-finite values. Some values may be outside of distribution support.

Anyone have any thoughts on why these lines are necessary?

Thanks!

junpenglao · August 7, 2017, 6:50am

If there are missing value in your observed data, it can create problems (PyMC3 can handle missing data, but if the missing data is discrete it fails back to Metropolis which is not ideal). Here in the example, the missing data (NaN in this case) was replaced by the mean.

alphamaximus · August 7, 2017, 6:00pm

Thanks.

I’m still a little confused. When you say the missing data is discrete, do you mean non-continuous?

In the notebook, it has:

R = pm.Normal(
    'R', mu=theano.tensor.dot(U, V.T), tau=self.alpha * np.ones((n, m)),
    observed=self.data)

So shouldn’t it model the missing value as a Normal variable, hence continuous, and therefore, use NUTS?

junpenglao · August 7, 2017, 6:21pm

Yes in this case the missing value will also sampled using NUTS. For masking a missing value you can have a look at this example: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/lasso_missing.py

The handling of missing data is in line 24 and 28.

alphamaximus · August 7, 2017, 7:05pm

Thanks…makes sense.

alphamaximus · August 13, 2017, 7:20pm

Hi,

So as suggested, I followed the probabilistic matrix factorization notebook, and modified it for my specific application, which is factor analysis for stocks.

After all the modifications, my code looks like this:

import multiprocessing
import numpy as np
import pymc3 as pm
import scipy
import seaborn as sns
import theano
import theano.tensor as tt


# Enable on-the-fly graph computations, but ignore absence of intermediate test values.
theano.config.compute_test_value = 'ignore'


def factors(dataset, n_factors=1, verbose=0, draws=10000, target_accept=0.90):
    '''
    ## References
    - https://pymc-devs.github.io/pymc3/notebooks/pmf-pymc.html
    '''

    data        = dataset.values.copy()
    T, n_stocks = data.shape

    with pm.Model() as model:

        # U
        factor_returns_std = pm.Exponential(
            name = 'factor_returns_std',
            lam  = 1.0 / data.std(axis=1).mean()
        )

        factor_returns_alpha = pm.Deterministic(
            name = 'factor_returns_alpha',
            var  = 1.0 / factor_returns_std ** 2
        )

        factor_returns = pm.MvNormal(
            name    = 'factor_returns',
            mu      = 0,
            tau     = factor_returns_alpha * np.eye(n_factors),
            shape   = (T, n_factors),
            testval = np.random.randn(T, n_factors) * factor_returns_std
        )

        # V
        beta_std   = 1.0
        beta_alpha = 1.0 / beta_std ** 2

        beta = pm.MvNormal(
            name    = 'beta',
            mu      = 1.0,
            tau     = beta_alpha * np.eye(n_factors),
            shape   = (n_stocks, n_factors),
            testval = np.random.randn(n_stocks, n_factors) * beta_std
        )

        # R        
        idiosyncratic_std = pm.Exponential(
            name = 'idiosyncratic_std',
            lam  = 1.0 / data.std(axis=1).mean()
        )
        
        idiosyncratic_alpha = pm.Deterministic(
            name = 'idiosyncratic_alpha',
            var  = 1.0 / idiosyncratic_std ** 2
        )

        stock_returns = pm.Normal(
            name     = 'stock_returns',
            mu       = tt.dot(factor_returns, beta.T),
            tau      = idiosyncratic_alpha * np.ones((T, n_stocks)),
            observed = data
        )

        cov = pm.Deterministic(
            name = 'cov',
            var  = factor_returns_std ** 2 * tt.dot(beta, beta.T) + (idiosyncratic_std ** 2) * np.eye(n_stocks)
        )

        stock_std = pm.Deterministic(
            name = 'stock_std',
            var  = tt.sqrt(tt.diag(cov))
        )

#         _map = pm.find_MAP(fmin=sp.optimize.fmin_powell, disp=True)

        trace = pm.sample(
            draws            = draws / multiprocessing.cpu_count(),
            init             = 'advi',
            n_init           = 150000,
            progressbar      = verbose > 0,
            njobs            = multiprocessing.cpu_count(),
            nuts_kwargs      = {'target_accept': target_accept},
            # live_plot        = verbose > 0,
            # live_plot_kwargs = {'varnames': ['mu', 'sd'], 'combined': True},
        )

    if verbose > 0:

        pm.summary(trace=trace, varnames=['stock_std', 'idiosyncratic_std', 'factor_returns_std'])

        pm.traceplot(
            trace    = trace,
            varnames = ['stock_std', 'idiosyncratic_std', 'factor_returns_std'],
            combined = True
        )

    return trace

trace = factors(
    dataset   = returns,
    n_factors = 5,
    draws     = 10000,
    verbose   = 1
)

For my particular dataset, my results look like this:

stock_std:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.008            0.000            0.000            [0.008, 0.009]
  0.010            0.000            0.000            [0.009, 0.011]
  0.009            0.000            0.000            [0.008, 0.009]
  0.008            0.000            0.000            [0.007, 0.008]
  0.011            0.000            0.000            [0.010, 0.012]
  0.011            0.000            0.000            [0.010, 0.011]
  0.010            0.000            0.000            [0.009, 0.010]
  0.010            0.000            0.000            [0.010, 0.011]
  0.005            0.000            0.000            [0.005, 0.006]
  0.010            0.000            0.000            [0.009, 0.011]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.008          0.008          0.008          0.009          0.009
  0.009          0.010          0.010          0.010          0.011
  0.008          0.008          0.009          0.009          0.009
  0.007          0.007          0.008          0.008          0.008
  0.010          0.011          0.011          0.011          0.012
  0.010          0.010          0.011          0.011          0.011
  0.009          0.009          0.010          0.010          0.010
  0.010          0.010          0.010          0.011          0.011
  0.005          0.005          0.005          0.005          0.006
  0.009          0.010          0.010          0.010          0.011


idiosyncratic_std:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.004            0.000            0.000            [0.004, 0.004]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.004          0.004          0.004          0.004          0.004


factor_returns_std:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.167            0.060            0.006            [0.052, 0.243]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.054          0.119          0.188          0.209          0.248

The trace plot for the factor_returns_std looks worrying.

What would be the best strategy to from this point? A longer trace?

Thanks!

junpenglao · August 13, 2017, 8:43pm

I would say it might be the choice of prior. The factor_returns in your model is actually just a pm.Normal with shape=(T, n_factors), as they all have the same tau factor_returns_alpha * np.eye(n_factors). Could you try just using a pm.Normal, with the sd using a HalfCauchy as prior?

junpenglao · August 13, 2017, 8:44pm

Also, try the auto init or init='advi+adapt_diag' (you might need to update to master first). They should perform better.

alphamaximus · August 14, 2017, 3:17am

Changing the init not does seem to help.

advi

advi+adapt_diag

auto

Looks like ‘advi’ works the best.

alphamaximus · August 14, 2017, 3:57am

I would say it might be the choice of prior. The factor_returns in your model is actually just a pm.Normal with shape=(T, n_factors), as they all have the same tau factor_returns_alpha * np.eye(n_factors). Could you try just using a pm.Normal, with the sd using a HalfCauchy as prior?

I changed:

factor_returns = pm.MvNormal(
    name    = 'factor_returns',
    mu      = 0,
    tau     = factor_returns_alpha * np.eye(n_factors),
    shape   = (T, n_factors),
    testval = np.random.randn(T, n_factors) * factor_returns_std
)

beta = pm.MvNormal(
    name    = 'beta',
    mu      = 1.0,
    tau     = beta_alpha * np.eye(n_factors),
    shape   = (n_stocks, n_factors),
    testval = np.random.randn(n_stocks, n_factors) * beta_std
)

to:

factor_returns = pm.Normal(
    name    = 'factor_returns',
    mu      = 0,
    sd      = factor_returns_std,
    shape   = (T, n_factors),
    testval = np.random.randn(T, n_factors) * factor_returns_std
)

beta = pm.Normal(
    name    = 'beta',
    mu      = 1.0,
    sd      = beta_std,
    shape   = (n_stocks, n_factors),
    testval = np.random.randn(n_stocks, n_factors) * beta_std
)

Results are slightly better now with 'advi':

Distribution plot for factor_returns_std looks better but the trace plot still not ideal.

junpenglao · August 14, 2017, 5:53am

Did you also change the prior for sd? Also, any warning during the sampling?

Also, could you please plot the trace with combined = false?

alphamaximus · August 14, 2017, 12:11pm

Did you also change the prior for sd?

In the above plots, no.

Exponential doesn’t seem that different than a HalfCauchy, but perhaps I am wrong?

Also, any warning during the sampling?

UserWarning: Chain 0 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 0 does not match the target. It is 0.128292122599, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 1 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 1 does not match the target. It is 0.39805958331, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 2 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 2 does not match the target. It is 0.80128442096, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 3 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 3 does not match the target. It is 0.705833520306, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 4 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 4 does not match the target. It is 0.758905225987, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 5 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 5 does not match the target. It is 0.472832357998, but should be close to 0.9. Try to increase the number of tuning steps.

Also, could you please plot the trace with combined = false?

In this plot, I’m using HalfCauchy for priors:

factor_returns_std = pm.HalfCauchy(
    name = 'factor_returns_std',
    beta = 1,
)

idiosyncratic_std = pm.HalfCauchy(
    name = 'idiosyncratic_std',
    beta = 1,
)

junpenglao · August 14, 2017, 1:39pm

HalfCauchy is more conventionally used as the prior for standard deviation.

I have the impression the trace of factor_return_std is slowing drifting towards .1, but because of the high target_accept it’s taking very small steps. So just a hunch maybe try to set the starting value to .1? You could do so by providing a test_val:

factor_returns_std = pm.HalfCauchy(name = 'factor_returns_std', beta = 1, test_val=.1)

aseyboldt · August 14, 2017, 2:10pm

The problem seems to be the centered parametrization for factor_returns, it should work much better if you use a non-centered parametrization:

factor_returns_raw = pm.Normal(
    'factor_returns_raw', mu=0, sd=1, shape=(T, n_factors))
factor_returns = pm.Deterministic(
    'factor_returns', factor_returns_raw * factor_returns_std)

And probably the same for beta, too.

Otherwise: As junpenglao said, don’t use a MvNormal if you don’t have to. The results will be the same, but it is much slower.
Using sd instead of tau is considered best practice (unless you happen to have the precision matrix in a mv setting), most people think in terms of scale parameters, not precision. There really isn’t a speed difference.
The usual goto prior for scale parameters is HalfStudentT or HalfCauchy (or maybe HalfFlat). I’ve never seen anyone use an Exponential before, and I can’t think of a reason why you would. The prior for idiosyncratic_std probably doesn’t matter at all, there is enough data available anyway. But the prior on the scale of the beta might very well matter.
Sure you want to assume that idiosyncratic_std is the same for all stocks? That doesn’t sound right to me…

I hope that helps

alphamaximus · August 14, 2017, 9:10pm

Thanks, all the feedback has been really helpful!

Although I could be wrong, I don’t think the scale of beta matters as long as the scale of the factor returns is flexible. This is why I have:

beta_std = 1.0

factor_returns_std = pm.HalfCauchy(
    name    = 'factor_returns_std',
    beta    = 1,
    testval = data.std(axis=1).mean()
)

But maybe my intuition is incorrect and I should also model beta_std as a pm.HalfCauchy. Thoughts?

Yeah, I wanted to limit the complexity of the model, but I recently added this:

idiosyncratic_std = pm.HalfCauchy(
    name    = 'idiosyncratic_std',
    beta    = 1,
    shape   = n_stocks,
    testval = data.std(axis=1).mean()
)

alphamaximus · August 15, 2017, 3:15am

I switched to the default target_accept with better results. Thanks!

junpenglao:

So just a hunch maybe try to set the starting value to .1? You could do so by providing a test_val:
factor_returns_std = pm.HalfCauchy(name = 'factor_returns_std', beta = 1, test_val=.1)

This doesn’t help much. In the context of having init = 'advi', I feel that the testval doesn’t do much. Is this not correct?

Thank you so much for your help! That has all been very useful!

alphamaximus · August 16, 2017, 1:43pm

I’ve created a Github repository for this work. You can check it out here:

github.com

jonathanng/shared/blob/master/notebooks/bayesian_correlation.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-08-15T20:32:52.532468Z",
     "start_time": "2017-08-15T20:32:52.469518Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",

This file has been truncated. show original

Please send me feedback if you have any. Thanks!

alphamaximus · August 17, 2017, 1:52pm

Also, what if I want to make the n_factors a discrete variable such as a DiscreteUniform? Is this possible/practical?

Topic		Replies	Views
Beta process (or truncated Indian Buffet process) factor analysis in the vein of the Dirichlet Process example Questions	8	1995	May 17, 2021
Unique solution for probabilistic PCA Questions	47	4210	March 8, 2019
How to decide on what priors distributions to use for parameters? Questions	22	5822	January 7, 2022
Large Scale Factor Analysis with minibatch ADVI Questions	9	2954	August 20, 2017
Computing Bayes Factor with Bernoulli Distribution v5 modeling	11	94	June 3, 2025

Factor Analysis

Related topics