Having problems defining random function for custom distribution with mixture of observed and random variables

I am a pymc newbee and I have been looking for examples of this but havent found any. I have no problem creating a custom function for survival analysis with the Gompertz function:

def survival(event, logalpha, beta, time):
    log_h = logalpha + beta * time
    H = (np.exp(logalpha)/beta) * (np.exp(beta * time) - 1)
    return ((event * log_h) - H).sum()

with pm.Model() as model:
        param, mu, y, sigma = {}, {}, {}, {}
        BoundedNormal = pm.Bound(pm.Normal, lower=-10, upper=0)
        BoundedNormal2 = pm.Bound(pm.Normal, lower=0, upper=0.5)
        BoundedNormal3 = pm.Bound(pm.Normal, lower=-2, upper=2)
        BoundedHalfNormal = pm.Bound(pm.HalfNormal, lower=0, upper=2)
        for curr_feature in bn.nodes():
            if curr_feature == 'S':
                logalpha = BoundedNormal('logalpha', sd=10)
                beta = BoundedNormal2('beta', sd=10)
                S = pm.DensityDist('S', survival,
                                   observed={'event': data['status'],
                                             'logalpha': logalpha,
                                             'beta': beta,
                                             'time': data['Survival Time']/365
with model:
        trace = pm.sample(2000, tune=1000)

This runs perfectly. However, I would also like to do some posterior predictions for new data. I am having problems with defining the random function for the Gompertz function. I am assuming I need to draw logalpha and beta and take event and time from the test data. How do I do this? How should I pass event and time to this function? The following gives me errors:

def survival_random(point=None, size=None):
                    # draw a numerical value for the parameters
                    logalpha_, beta_ = draw_values([loagalpha, beta], point=point)
                    size = 1 if size is None else size
                    return generate_samples(survival, logalpha=logalpha_, beta=beta_, size=size, broadcast_shape=(size,))

This gives the following error:
TypeError Traceback (most recent call last)
----> 1 model = run_model(bn, data_shared)
2 with model:
3 trace = pm.sample(10000, tune=2000, cores=4)

<ipython-input-17-e9ddcacb62db> in run_model(bn, data, columns)
     52                 S = pm.DensityDist('S', survival(logalpha, beta, data[:, i], data[:, j]/365),
---> 53                                    random=random_survival
     54                                    )
     55             elif curr_feature == 'logalpha':

~/pymc3/lib/python3.7/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     40             total_size = kwargs.pop('total_size', None)
     41             dist = cls.dist(*args, **kwargs)
---> 42             return model.Var(name, dist, data, total_size)
     43         else:
     44             raise TypeError("Name needs to be a string but got: {}".format(name))

~/pymc3/lib/python3.7/site-packages/pymc3/model.py in Var(self, name, dist, data, total_size)
    806                 with self:
    807                     var = FreeRV(name=name, distribution=dist,
--> 808                                  total_size=total_size, model=self)
    809                 self.free_RVs.append(var)
    810             else:

~/pymc3/lib/python3.7/site-packages/pymc3/model.py in __init__(self, type, owner, index, name, distribution, total_size, model)
   1206             self.tag.test_value = np.ones(
   1207                 distribution.shape, distribution.dtype) * distribution.default()
-> 1208             self.logp_elemwiset = distribution.logp(self)
   1209             # The logp might need scaling in minibatches.
   1210             # This is done in `Factor`.

TypeError: 'TensorVariable' object is not callable

Thanks for helping me out with this in advance.

Hmmm, maybe an easier way to do is to extract the logalpha and beta from the posterior (i.e., trace), and build the posterior prediction logalpha + beta * time, and logistic transformed it and generate Bernoulli random?

Thanks for your answer @junpenglao. The problem I am trying to solve is to predict alpha and beta given a set of covariates. Also, it is essential to predict alpha and beta separately as one is proportional to fitness of individual at t=0 and beta is related to “aging” rate of the individual (i.e., the rate at which fitness reduces with time). Ideally, I would like to have both alpha and beta individually rather than just alpha + beta*time.

Is there no way to do this with pymc? In general, it would be great to have a few examples of how random function is written for custom functions within pymc (or maybe I just haven’t found them). For example, could I make logalpha and beta observed but make all of it missing values?

Again, thanks in advance.

You already have alpha and beta individual from the posterior trace. The question is to generate random sample of S according to the Density function.
If a specific random generation algorithm is not known, I guess you can always implement a reject sampler or an important sampler. There is no example for it in pymc3, so if you got something working please share with us :wink:

@junpenglao I think I understand what you are saying now. In an ideal world, we should treat this as a Poisson process with time-dependent probabilities. Can you please help me figure out how we can implement what you have said in the random function given that time is part of the input data and logalpha and beta are latent parameters? I am struggling with writing the right random function for this particular situation.

Since your logp of the observed is:

def survival(event, logalpha, beta, time):
    log_h = logalpha + beta * time
    H = (np.exp(logalpha)/beta) * (np.exp(beta * time) - 1)
    return ((event * log_h) - H).sum()

what I would do (not sure if it is correct or not!) is get a temporal function of the latent alpha and beta:

def survival2_random(logalpha, beta, time):
    log_h = logalpha + beta * time
    H = (np.exp(logalpha)/beta) * (np.exp(beta * time) - 1)
    logp_temporal = np.stack([(0. * log_h) - H, (1. * log_h) - H]) <== observed being 0 or 1
    p_temporal = np.exp(logp_temporal)
    p_norm = p_temporal/p_temporal.sum(axis=1, keepdim=True)
    return np.random.bernoulli(p_norm)

# and call it like:
for point in trace:
    randomsample = survival2_random(point['logalpha'], point['beta'], time)

I did not run the code so there might be mistake :wink: