Factor Analysis

Hello,

I’m interested in a Bayesian version of Factor Analysis.

Has anyone seen examples of this?

Thanks!

We have a probabilistic factor analysis example in the doc:

Thanks. I’ve seen this before, but it didn’t connect with me that this is factor analysis.

I’m going through the notebook, and in the __init__ of the PMF object, it reads

# Perform mean value imputation
nan_mask = np.isnan(self.data)
self.data[nan_mask] = self.data[~nan_mask].mean()

I understand why it’s here, but it doesn’t seem necessary, nor very Bayesian.

So I commented it out and I got this error:


ValueError Traceback (most recent call last)
in ()
1 # Find MAP for PMF.
----> 2 pmf.find_map()

in _find_map(self)
52 with self.model:
53 logging.info(‘finding PMF MAP using Powell optimization…’)
—> 54 self._map = pm.find_MAP(fmin=sp.optimize.fmin_powell, disp=True)
55
56 elapsed = int(time.time() - tstart)

~/anaconda/lib/python3.6/site-packages/pymc3-3.1-py3.6.egg/pymc3/tuning/starting.py in find_MAP(start, vars, fmin, return_raw, model, live_disp, callback, *args, **kwargs)
166 "density. 2) your distribution logp’s are " +
167 “properly specified. Specific issues: \n” +
–> 168 specific_errors)
169 mx = {v.name: mx[v.name].astype(v.dtype) for v in model.vars}
170

ValueError: Optimization error: max, logp or dlogp at max have non-finite values. Some values may be outside of distribution support.

Anyone have any thoughts on why these lines are necessary?

Thanks!

If there are missing value in your observed data, it can create problems (PyMC3 can handle missing data, but if the missing data is discrete it fails back to Metropolis which is not ideal). Here in the example, the missing data (NaN in this case) was replaced by the mean.

Thanks.

I’m still a little confused. When you say the missing data is discrete, do you mean non-continuous?

In the notebook, it has:

R = pm.Normal(
    'R', mu=theano.tensor.dot(U, V.T), tau=self.alpha * np.ones((n, m)),
    observed=self.data)

So shouldn’t it model the missing value as a Normal variable, hence continuous, and therefore, use NUTS?

Yes in this case the missing value will also sampled using NUTS. For masking a missing value you can have a look at this example: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/lasso_missing.py

The handling of missing data is in line 24 and 28.

Thanks…makes sense.

Hi,

So as suggested, I followed the probabilistic matrix factorization notebook, and modified it for my specific application, which is factor analysis for stocks.

After all the modifications, my code looks like this:

import multiprocessing
import numpy as np
import pymc3 as pm
import scipy
import seaborn as sns
import theano
import theano.tensor as tt


# Enable on-the-fly graph computations, but ignore absence of intermediate test values.
theano.config.compute_test_value = 'ignore'


def factors(dataset, n_factors=1, verbose=0, draws=10000, target_accept=0.90):
    '''
    ## References
    - https://pymc-devs.github.io/pymc3/notebooks/pmf-pymc.html
    '''

    data        = dataset.values.copy()
    T, n_stocks = data.shape

    with pm.Model() as model:

        # U
        factor_returns_std = pm.Exponential(
            name = 'factor_returns_std',
            lam  = 1.0 / data.std(axis=1).mean()
        )

        factor_returns_alpha = pm.Deterministic(
            name = 'factor_returns_alpha',
            var  = 1.0 / factor_returns_std ** 2
        )

        factor_returns = pm.MvNormal(
            name    = 'factor_returns',
            mu      = 0,
            tau     = factor_returns_alpha * np.eye(n_factors),
            shape   = (T, n_factors),
            testval = np.random.randn(T, n_factors) * factor_returns_std
        )

        # V
        beta_std   = 1.0
        beta_alpha = 1.0 / beta_std ** 2

        beta = pm.MvNormal(
            name    = 'beta',
            mu      = 1.0,
            tau     = beta_alpha * np.eye(n_factors),
            shape   = (n_stocks, n_factors),
            testval = np.random.randn(n_stocks, n_factors) * beta_std
        )

        # R        
        idiosyncratic_std = pm.Exponential(
            name = 'idiosyncratic_std',
            lam  = 1.0 / data.std(axis=1).mean()
        )
        
        idiosyncratic_alpha = pm.Deterministic(
            name = 'idiosyncratic_alpha',
            var  = 1.0 / idiosyncratic_std ** 2
        )

        stock_returns = pm.Normal(
            name     = 'stock_returns',
            mu       = tt.dot(factor_returns, beta.T),
            tau      = idiosyncratic_alpha * np.ones((T, n_stocks)),
            observed = data
        )

        cov = pm.Deterministic(
            name = 'cov',
            var  = factor_returns_std ** 2 * tt.dot(beta, beta.T) + (idiosyncratic_std ** 2) * np.eye(n_stocks)
        )

        stock_std = pm.Deterministic(
            name = 'stock_std',
            var  = tt.sqrt(tt.diag(cov))
        )

#         _map = pm.find_MAP(fmin=sp.optimize.fmin_powell, disp=True)

        trace = pm.sample(
            draws            = draws / multiprocessing.cpu_count(),
            init             = 'advi',
            n_init           = 150000,
            progressbar      = verbose > 0,
            njobs            = multiprocessing.cpu_count(),
            nuts_kwargs      = {'target_accept': target_accept},
            # live_plot        = verbose > 0,
            # live_plot_kwargs = {'varnames': ['mu', 'sd'], 'combined': True},
        )

    if verbose > 0:

        pm.summary(trace=trace, varnames=['stock_std', 'idiosyncratic_std', 'factor_returns_std'])

        pm.traceplot(
            trace    = trace,
            varnames = ['stock_std', 'idiosyncratic_std', 'factor_returns_std'],
            combined = True
        )

    return trace

trace = factors(
    dataset   = returns,
    n_factors = 5,
    draws     = 10000,
    verbose   = 1
)

For my particular dataset, my results look like this:

stock_std:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.008            0.000            0.000            [0.008, 0.009]
  0.010            0.000            0.000            [0.009, 0.011]
  0.009            0.000            0.000            [0.008, 0.009]
  0.008            0.000            0.000            [0.007, 0.008]
  0.011            0.000            0.000            [0.010, 0.012]
  0.011            0.000            0.000            [0.010, 0.011]
  0.010            0.000            0.000            [0.009, 0.010]
  0.010            0.000            0.000            [0.010, 0.011]
  0.005            0.000            0.000            [0.005, 0.006]
  0.010            0.000            0.000            [0.009, 0.011]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.008          0.008          0.008          0.009          0.009
  0.009          0.010          0.010          0.010          0.011
  0.008          0.008          0.009          0.009          0.009
  0.007          0.007          0.008          0.008          0.008
  0.010          0.011          0.011          0.011          0.012
  0.010          0.010          0.011          0.011          0.011
  0.009          0.009          0.010          0.010          0.010
  0.010          0.010          0.010          0.011          0.011
  0.005          0.005          0.005          0.005          0.006
  0.009          0.010          0.010          0.010          0.011


idiosyncratic_std:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.004            0.000            0.000            [0.004, 0.004]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.004          0.004          0.004          0.004          0.004


factor_returns_std:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.167            0.060            0.006            [0.052, 0.243]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.054          0.119          0.188          0.209          0.248

The trace plot for the factor_returns_std looks worrying.

What would be the best strategy to from this point? A longer trace?

Thanks!

I would say it might be the choice of prior. The factor_returns in your model is actually just a pm.Normal with shape=(T, n_factors), as they all have the same tau factor_returns_alpha * np.eye(n_factors). Could you try just using a pm.Normal, with the sd using a HalfCauchy as prior?

Also, try the auto init or init='advi+adapt_diag' (you might need to update to master first). They should perform better.

Changing the init not does seem to help.

advi

advi+adapt_diag

auto

Looks like ‘advi’ works the best.

I would say it might be the choice of prior. The factor_returns in your model is actually just a pm.Normal with shape=(T, n_factors), as they all have the same tau factor_returns_alpha * np.eye(n_factors). Could you try just using a pm.Normal, with the sd using a HalfCauchy as prior?

I changed:

factor_returns = pm.MvNormal(
    name    = 'factor_returns',
    mu      = 0,
    tau     = factor_returns_alpha * np.eye(n_factors),
    shape   = (T, n_factors),
    testval = np.random.randn(T, n_factors) * factor_returns_std
)

beta = pm.MvNormal(
    name    = 'beta',
    mu      = 1.0,
    tau     = beta_alpha * np.eye(n_factors),
    shape   = (n_stocks, n_factors),
    testval = np.random.randn(n_stocks, n_factors) * beta_std
)

to:

factor_returns = pm.Normal(
    name    = 'factor_returns',
    mu      = 0,
    sd      = factor_returns_std,
    shape   = (T, n_factors),
    testval = np.random.randn(T, n_factors) * factor_returns_std
)

beta = pm.Normal(
    name    = 'beta',
    mu      = 1.0,
    sd      = beta_std,
    shape   = (n_stocks, n_factors),
    testval = np.random.randn(n_stocks, n_factors) * beta_std
)

Results are slightly better now with 'advi':

Distribution plot for factor_returns_std looks better but the trace plot still not ideal.

Did you also change the prior for sd? Also, any warning during the sampling?

Also, could you please plot the trace with combined = false?

Did you also change the prior for sd?

In the above plots, no.

Exponential doesn’t seem that different than a HalfCauchy, but perhaps I am wrong?

Also, any warning during the sampling?

UserWarning: Chain 0 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 0 does not match the target. It is 0.128292122599, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 1 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 1 does not match the target. It is 0.39805958331, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 2 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 2 does not match the target. It is 0.80128442096, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 3 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 3 does not match the target. It is 0.705833520306, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 4 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 4 does not match the target. It is 0.758905225987, but should be close to 0.9. Try to increase the number of tuning steps.
UserWarning: Chain 5 reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
UserWarning: The acceptance probability in chain 5 does not match the target. It is 0.472832357998, but should be close to 0.9. Try to increase the number of tuning steps.

Also, could you please plot the trace with combined = false?

In this plot, I’m using HalfCauchy for priors:

factor_returns_std = pm.HalfCauchy(
    name = 'factor_returns_std',
    beta = 1,
)

idiosyncratic_std = pm.HalfCauchy(
    name = 'idiosyncratic_std',
    beta = 1,
)

HalfCauchy is more conventionally used as the prior for standard deviation.

I have the impression the trace of factor_return_std is slowing drifting towards .1, but because of the high target_accept it’s taking very small steps. So just a hunch maybe try to set the starting value to .1? You could do so by providing a test_val:

factor_returns_std = pm.HalfCauchy(name = 'factor_returns_std', beta = 1, test_val=.1)
1 Like

The problem seems to be the centered parametrization for factor_returns, it should work much better if you use a non-centered parametrization:

factor_returns_raw = pm.Normal(
    'factor_returns_raw', mu=0, sd=1, shape=(T, n_factors))
factor_returns = pm.Deterministic(
    'factor_returns', factor_returns_raw * factor_returns_std)

And probably the same for beta, too.

Otherwise: As junpenglao said, don’t use a MvNormal if you don’t have to. The results will be the same, but it is much slower.
Using sd instead of tau is considered best practice (unless you happen to have the precision matrix in a mv setting), most people think in terms of scale parameters, not precision. There really isn’t a speed difference.
The usual goto prior for scale parameters is HalfStudentT or HalfCauchy (or maybe HalfFlat). I’ve never seen anyone use an Exponential before, and I can’t think of a reason why you would. The prior for idiosyncratic_std probably doesn’t matter at all, there is enough data available anyway. But the prior on the scale of the beta might very well matter.
Sure you want to assume that idiosyncratic_std is the same for all stocks? That doesn’t sound right to me…

I hope that helps :slight_smile:

1 Like

Thanks, all the feedback has been really helpful!

Although I could be wrong, I don’t think the scale of beta matters as long as the scale of the factor returns is flexible. This is why I have:

beta_std = 1.0

factor_returns_std = pm.HalfCauchy(
    name    = 'factor_returns_std',
    beta    = 1,
    testval = data.std(axis=1).mean()
)

But maybe my intuition is incorrect and I should also model beta_std as a pm.HalfCauchy. Thoughts?

Yeah, I wanted to limit the complexity of the model, but I recently added this:

idiosyncratic_std = pm.HalfCauchy(
    name    = 'idiosyncratic_std',
    beta    = 1,
    shape   = n_stocks,
    testval = data.std(axis=1).mean()
)

I switched to the default target_accept with better results. Thanks!

This doesn’t help much. In the context of having init = 'advi', I feel that the testval doesn’t do much. Is this not correct?

Thank you so much for your help! That has all been very useful!

I’ve created a Github repository for this work. You can check it out here:

Please send me feedback if you have any. Thanks!

2 Likes

Also, what if I want to make the n_factors a discrete variable such as a DiscreteUniform? Is this possible/practical?