Custom distribution over subsets of a set

Assume I have a set \mathcal{S}. My parameter is possible subsets of \mathcal{S}, i.e. \{0,1\}^{|\mathcal{S}|}. I want to define a prior over this space, such that the probability of choosing any element of this set is Bernoulli(p). Trying to implement this as a custom distribution, I have this,

class Subset(Discrete):
    """Distribution for subset prior"""
    _superset = None
    _bernoulli = None
    _p = None

    def __init__(self, superset, p, *args, **kwargs):
        super(Subset, self).__init__(*args, **kwargs)
        self._superset = superset
        self._bernoulli = Bernoulli("bernoulli", p=p)
        self._p = p

    def logp(self, value):
        total = 0;
        for i in value:
            total += self._bernoulli.logp(value[i])

        return total;

    def random(self, point=None, size=None):
        random_samples = [] # list of tuples
        if size is None: size=1;
        for i in range(size):
            sample = []
            for element in self._superset: 
                if self._bernoulli.random(): sample.append(1)
                else: sample.append(0)
        return random_samples

logp should be evaluated over a subset, i.e. \{0,1\}^{|\mathcal{S}|}. However, in def logp(self, value) I get an error saying that I cannot iterate over value. Am I looking at this incorrectly?

This line

self._bernoulli = Bernoulli("bernoulli", p=p)

should be:

self._bernoulli = Bernoulli.dist(p=p)

Otherwise you are creating a model variable, and I assume you only want to extract the logp method.

Value is a theano symbolic variable, which does not allow iteration as a typical python list would. Fortunately the Bernoulli logp works with vectors, so you don’t need to loop explicitly. This should do it:

def logp(self, value):
        total = self._bernoulli.logp(value).sum()
        return total

But I am not sure what you meant by your for i in value: value[i]. My reply assumes this was a typo and you meant for i in range(len(value)) : value[i]

1 Like