Categorical model with continuous dependent variable

I am having trouble setting up a Categorical model with the dependence that interests me.

For background, I have a probability which I observe to be an exponentially decaying function of some continuous observable parameter b. So I have a series [b, s] where I observe a binary state s given the continuous parameter b. I am not sure that a logistic regression will correctly capture the dependence on b that I am seeing, so I want to try another approach.

So I want to try modeling the probability as a beta function with a mean that depends on b.

Regardless of whether or not this will be an effective model, I should be able to set it up in pymc3, but I’m running into trouble. In reality I have thousands of samples, which is why I would rather have a continuous dependence on b rather than a bucketed dependence.

Here is a toy example of what I’m trying to do:

import numpy as np
import pandas as pd
import pymc3 as pm

b = np.array([100, 120, 150, 200, 1000, 10000, 15000])
s = np.array([1, 0, 1, 1, 0, 0, 0])
s = pd.Categorical(s).codes

model = pm.Model()
with model:
    sd = pm.HalfNormal('sd', sd=1)

    l1 = pm.HalfNormal('l1', sd=100)
    drift = tt.exp(-l1*b)

    p1 = pm.Beta.dist(balance_drift, balance_sd)

    obs = pm.Categorical('obs', p=tt.stack([p1, 1.-p1]), observed=s)

The error I observe is

TypeError: unsupported operand type(s) for -: 'float' and 'Beta'

Do you mean beta function or beta distribution? A log-beta function is available as pm.distributions.dist_math.betaln. If you are trying to get a beta distribution you should do p1 = pm.Beta('p', balance_drift, balance_sd)

A beta distribution.

When I make the change that you suggest, the error I get is

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 0, got 1 with shape (7,).

This shows up when constructing ‘obs’, but seems to be coming from the drift term.
Do I need to transpose my data? But it is in an array. And I don’t want a separate Beta for every sample.

If s only has binary value 0 and 1, you should model it as a Bernoulli:
obs = pm.Bernoulli('obs', p=p1, observed=s)

I wanted to start with a simple case, so s is only binary right now. I will add more states later. With Bernoulli, I am still getting the same error though. Here is my new test case, with the changes suggested.

import numpy as np
import pandas as pd
import pymc3 as pm
import theano.tensor as tt

b = np.array([100, 120, 150, 200, 1000, 10000, 15000])
s = np.array([1, 0, 1, 1, 0, 0, 0])
s = pd.Categorical(s).codes

model = pm.Model()
with model:
    l1 = pm.HalfNormal('l1', sd=100)
    drift = tt.exp(-l1*b)
    sd = pm.HalfNormal('sd', sd=1)

    p1 = pm.Beta('p1', drift, sd)

    obs = pm.Bernoulli('obs', p=p1, observed=s)

Have you tried setting shape=7 on p1?

@gBokiau is right you should set the shape=7 on p1.

However, since you said earlier:

You should model p differently and had it multiple with b later on.

Hmmm. Is this a fundamental math problem, or just a limitation of pymc3?

So suppose I have many samples of some substance which changes state spontaneously. Like time bombs. All my samples have different ages. I want to estimate the half-life of my samples. So I have a series [a, s] where age is a continuous variable, and s is either “live” or “exploded”. How should I set up the calibration of the half-life, in pymc3?

One thought is to define a new quantity c(A) = count(s = 1 & a < A)/count(a < A), and fit that directly. ?

I also know that logistic regression is often used for modeling probabilities, but it’s not clear to me that it would be a natural model for a half-life (or, what I’m after next, a linear combination of half-lives).

In your setting, you are trying to find a function which map f(a) --> p, and pi(s|a) = Bernoulli(p) = Bernoulli(f(a))

In a logistic regression setting, f(a) is a logistic transformation of a linear function X*beta
In your example above, f(a) is a stochastic function that maps a exp(-l1*b) and sd into p. Since exp(-l1*b) takes a array as input, the final output p1 is also an array (1d tensor)

I hope this clear things up for you a bit.

Right, except it’s f(b). Sorry for renaming b -> a.

If I really do need a separate Beta for every sample, that is a lot of betas!

Yes that’s a lot of Betas, usually, you should infer the parameter in the function f() (l1 and sd), instead of the output p.