I am trying to set up a truncated Beta Process factor analysis by using a stick breaking construction as here: Beta Process Factor Analysis. The analysis is in the same vein as the pymc3 Dirichlet process example, just applied to a Beta process.
In short, the Beta process places a prior over binary matrices Z of dimensions (num_observations X num_factors). Each observation is assumed to be produced from a linear combination of the factors, chosen according to the entries of Z - specifically, if Z_{ij}=1, then factor j contributes to observation i. Thus, observation i can be written as Obs_{i} = \sum_{j}Z_{ij}\times factor_{j}.
There’s a Beta prior on the factor probabilities \pi_{k} (or the probabilities underlying the 0/1 entries of Z) with parameters a and b (this is taken from the paper linked above). If there are K factors in the data, the factor probabilities can be pulled as: \pi_{k} \sim Beta(\frac{a}{K}, \frac{b(K-1)}{K}).
Here’s some code that generates dummy data - I chose the factors themselves iid from \mathcal{N}(5, 2). I have 100 observations and 10 factors in this dummy dataset.
import numpy as np
import pymc3 as pm
import theano.tensor as tt
# Data generation
# 100 observations, each produced from a linear combination of 10 latent features
# Binary indicator array (100x10) indicating which latent features are "on" for every observation
# Beta process prior over the binary indicator array
a = 5
b = 10
K = 10
data_p = np.random.beta(a/K, b*(K-1)/K, size = 10)
data_p = np.tile(data_p[None, :], (100, 1))
# Bernoulli indicator array of latent features for all observations
data_Z = np.random.binomial(n = 1, p = data_p)
# The latent features, pulled from a Normal distribution
latent_features_data = np.random.normal(loc = 5, scale = 2, size = 10)
# Multiply the latent features and indicator array
# Observations are linear combinations of latent features, so sum along the features axis
data = np.sum(data_Z * latent_features_data[None, :], axis = -1) + np.random.normal(loc = 0, scale = 1, size = 100)
And here’s my pymc3 model. This is the most naive implementation, and necessitates a BinaryGibbsMetropolis sampling of the Bernoulli Z. The performance isn’t very good, even after a few thousand steps of tuning, NUTS has a lot of divergences and prompts me to increase “target_accept”. Of course, the sampling was nowhere close to converging.
# pymc3 model
with pm.Model() as model:
# Params for Beta distribution over factor probabilities
a = pm.Gamma("a", 1, 1)
b = pm.Gamma("b", 1, 1)
p = pm.Beta("p", a/K, b*(K-1)/K, shape = 10)
# Bernoulli indicator matrix of which factors are "on" for every observation
Z = pm.Bernoulli("Z", p = tt.tile(p[None, :], (100, 1)), shape = (100, 10))
# Latent factors - 10 of them
factors = pm.Normal("factors", mu = 0, sd = 1, shape = 10)
# Observations are linear combinations of the factors
equation = tt.sum(Z * tt.tile(factors[None, :], (100, 1)), axis = -1)
sd = pm.HalfCauchy("sd", 0.5)
obs = pm.Normal("obs", mu = equation, sd = sd, observed = data)
with model:
tr = pm.sample(tune = 4000, draws = 2000, njobs = 3)
Any tips/advice are very appreciated !