I am doing a generalized linear regression with Negative Binomial likelihood. This problem is different from a typical linear model where the x and y values are separate from each other. I am looking at timeseries data, where for each values in the series, I regress on the previous W values. I would like to define a custom distribution that can take in an array of counts, perform this manipulation, and determine the likelihood. This is my code:
class MyNegBin(pm.Discrete):
def __init__(self, W, intercept, beta, beta_recent, alpha, *args, **kwargs):
super(MyNegBin, self).__init__(*args, **kwargs)
self.W = W
self.intercept = intercept
self.beta = beta
self.beta_recent = beta_recent
self.alpha = alpha
def logp(self, counts):
W = self.W
N = len(counts) - W
x = np.vstack([counts[i:i+W] for i in range(N)])
y = counts[W:]
mu = self.intercept
for i in range(W-1):
mu += self.beta[i] * x[:,i]
mu += self.beta_recent * x[:,W-1]
lik = pm.NegativeBinomial.dist(mu=mu, alpha=self.alpha)
return lik.logp(y)
with pm.Model() as model:
intercept = pm.Normal('intercept', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=1, shape=W-1)
beta_recent = pm.Normal('beta_recent', mu=1, sd=1)
alpha = pm.TruncatedNormal('alpha', mu=2000, sigma=500, lower=0)
Y_obs = MyNegBin('Y_obs', W, intercept, beta, beta_recent, alpha, observed=counts)`
The problem is that logp() is expecting univariate distribution, so it’s required to take in a constant rather than an array. Is there any way to work around this?
you mean NegativeBinomial only accept vector mu
? I think we will need to modify its logp as well. What is the final shape of mu
?
NegativeBinomial takes in 2 parameters - mu and alpha. The final shape of mu is (N,). So I am plugging in an array of N mu’s and would like to return the total log likelihood, i.e., the sum of the log likelihoods for each mu. Is there some way to define a distribution with a logp() that takes in an array (i.e., counts) rather than a scalar/constant?
Hmmm, this seems to be fine?
with pm.Model():
pm.NegativeBinomial('x', np.ones(10), np.ones(10), shape=10)
Maybe try specifying the shape in pm.NegativeBinomial.dist(mu=mu, alpha=self.alpha, shape=...)
The issue I’m having is that I would like to pass in an array of counts and then perform the manipulation
x = np.vstack([counts[i:i+W] for i in range(N)])
y = counts[W:]
so that each element of x corresponds to an element of y for linear regression. So ideally I would be able to define a multivariate distribution over the entire array that performs this manipulation to get the x and y data before calculating the likelihood. Does this make sense? Is there a way to define a custom multivariate distribution?
I figured it out! Thank you!
Great to hear @ally-lee! Do you mind posting the solution here and marking it as “solution”? I’m sure it would help future readers 
My original solution wasn’t working because I had defined my distribution as a subclass of pm.Discrete, which defines univariate distributions. After reading some more of the documentation, I realized that the best way to do this was to define a custom likelihood function that took as arguments all of my model’s hyperparameters, and returned a function that computed the likelihood for a given data set. I could then use DensityDist to define a distribution with my custom likelihood.
def likelihood(W, N, intercept, beta, beta_recent, alpha):
def logp_(counts):
counts = counts.eval()
x = np.vstack([counts[i:i+W] for i in range(N)])
y = counts[W:]
mu = intercept
for i in range(W-1):
mu += beta[i] * x[:,i]
mu += beta_recent * x[:,W-1]
lik = pm.NegativeBinomial.dist(mu=mu, alpha=alpha)
return lik.logp(y)
return logp_
with pm.Model() as model:
intercept = pm.Normal('intercept', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=1, shape=W-1)
beta_recent = pm.Normal('beta_recent', mu=1, sd=1)
alpha = pm.TruncatedNormal('alpha', mu=2000, sigma=500, lower=0)
Y_obs = pm.DensityDist('Y_obs',
likelihood(W, len(counts)-W, intercept, beta, beta_recent, alpha),
observed=counts)
1 Like