Comparing Theano values

zahybnaya · July 17, 2017, 4:44pm

My logp function depends on comparison of model output to observed data. ( r==rs in the following function)

def logp(instances,h_epsilon,learning_iter,sub_path_length):
    max_trials=20
    lp=0
    trials=1;
    for i,sr in zip(instances,sub_path_length):
        m_path_f = make_model_path_length(i)
        while True:
            r=m_path_f(h_epsilon,learning_iter)
            print 'Model:{} Subject:{}'.format(r,sr)
            if r == sr or trials > max_trials:
                break
            trials+=1
        sample_data[(i.name,str(h_epsilon),str(learning_iter))]=trials
        lp+=sum([(1./t) for t in range(1,trials+1)])
        trials=1
    return -lp

The values of s and sr are Theano values and I’m not sure that this is how I am supposed to compare them.
What is the right way to do it?

Here is the rest of my model:

class IVS(Discrete):
    def __init__(self,instances,h_epsilon,learning_iter, *args, **kwargs):
        super(IVS, self).__init__(*args, **kwargs)#What does discrete expects?
        self.instances = instances
        self.h_epsilon = h_epsilon
        self.learning_iter = learning_iter

    def logp(self, value):
        return logp(self.instances,self.h_epsilon,self.learning_iter,value)


def get_instances_by_subject(path_file,subject):
    with open(path_file,'rb') as f:
        reader=DictReader(f)
        data=[d['instance'] for d in reader if d['subject']==subject and d['complete']=='True']
        return sorted(data)

def get_paths_by_subject(path_file,instance_names,subject,fun):
    with open(path_file,'rb') as f:
        reader=DictReader(f)
        data=[(d['instance'],fun(d)) for d in reader if d['subject']==subject and d['instance'] in instance_names]
        return [fr for (x,fr) in sorted(data, key=lambda x: instance_names.index(x[0]))]
d=get_paths_by_subject(path_file,get_instances_by_subject(path_file,subject),subject,lambda x:float(x['human_length']))
sample_data={}


def make_model_path_length(i):
    @as_op(itypes=[tt.dscalar,tt.lscalar], otypes=[tt.lscalar])
    def model_path_length(h_epsilon,learning_iter):
        return len(LRTA(i,heur=lambda x: (1+h_epsilon)*min_manhattan_distance(x),update_h=True,iters=learning_iter)[0])
    return model_path_length


lrta_model=Model()
with lrta_model:
    learning_iter_=DiscreteUniform('learning_iter_',lower=1, upper=5)
    learning_iter=Deterministic('learning_iter',learning_iter_)
    h_epsilon_ = Uniform('h_epsilon_',lower=0.,upper=1.)
    h_epsilon = Deterministic('h_epsilon',h_epsilon_)
    instances=np.array([i for i in instance_set if i.name in get_instances_by_subject(path_file,subject)]) #is this sorted properly?
    path_length=IVS('path_length',instances,h_epsilon,learning_iter, observed=d)

junpenglao · July 17, 2017, 5:56pm

you can do r.eval() to get the current value if it’s a theano tensor variable, so you can try r.eval() == sr.eval(). But it might work with r == sr as well.

PS, you can try the experimental sampler SMC, it would make a nice approximate bayesian computation example.

zahybnaya · July 17, 2017, 10:17pm

Calling r.eval() raises an error:

raise MissingInputError(error_msg, variable=r)
theano.gof.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute sigmoid(h_epsilon__interval__), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

zahybnaya · July 17, 2017, 10:30pm

I performed a check where 9 is compared with 9.
It returns False.
Printing ‘r,sr’ output is:

  FromFunctionOp{model_path_length}.0 , Subtensor{int64}.0

junpenglao · July 18, 2017, 4:11pm

Could you also provide some data? A self-contained ipynb would be the easiest for us to have a look at.

zahybnaya · July 18, 2017, 5:48pm

Thanks.
The following code should work.

from pymc3 import Model, DiscreteUniform, Uniform, Discrete, Deterministic
import theano.tensor as tt
from theano.compile.ops import as_op
import numpy as np
from csv import DictReader
from random import random

def make_fCalc(gF=1,hF=1,gAddition=1):
    def tmp_f(g,h,s):
        return (gF*(g+gAddition)+hF*h,g+gAddition,hF,s)
    tmp_f.__name__='{0}*(g+{1})+{2}h'.format(gF,gAddition,hF)
    return tmp_f


def LRTA(start,heur=lambda x:0,calcF=make_fCalc(),is_stop=lambda x:False,update_h=True,hcache={},iters=1):
    return [1 for _ in range(9)],[]



class IVS(Discrete):
    def __init__(self,instances,h_epsilon,learning_iter, *args, **kwargs):
        super(IVS, self).__init__(*args, **kwargs)#What does discrete expects?
        self.instances = instances
        self.h_epsilon = h_epsilon
        self.learning_iter = learning_iter

    def logp(self, value):
        print value
        return logp(self.instances,self.h_epsilon,self.learning_iter,value)

def make_model_path_length(i):
    @as_op(itypes=[tt.dscalar,tt.lscalar], otypes=[tt.lscalar])
    def model_path_length(h_epsilon,learning_iter):
        path,_= LRTA(i,heur=lambda x: (1+h_epsilon)*len(x),update_h=True,iters=learning_iter)
        print len(path)
        return len(path)
    return model_path_length

# TODO: store per trial 
def logp(instances,h_epsilon,learning_iter,sub_path_length):
    max_trials=20
    lp=0
    trials=1;
    for i,sr in zip(instances,sub_path_length):
        m_path_f = make_model_path_length(i)
        while True:
            r=m_path_f(h_epsilon,learning_iter)
            print 'Model:{} Subject:{}'.format(r,sr)
            if  r ==  sr or trials > max_trials:
                print trials
                break
            trials+=1
        lp+=sum([(1./t) for t in range(1,trials+1)])
        trials=1
    return -lp


lrta_model=Model()
with lrta_model:
    learning_iter=DiscreteUniform('learning_iter',lower=1, upper=5)
    h_epsilon = Uniform('h_epsilon',lower=0.,upper=1.)
    instances=np.array([0])
    path_length=IVS('path_length',instances,h_epsilon,learning_iter, observed=[9])

zahybnaya · July 18, 2017, 5:51pm

The attached code compares 9 with 9 and… fails.

junpenglao · July 19, 2017, 8:41am

upon a closer look I am not sure casting it to theano really meet your purpose. How about removing the @as_op in the function def make_model_path_length(i):?

zahybnaya · July 19, 2017, 1:29pm

There are two problems with this:
First, this throws me back to the original problem which was that theano variables do not support <,>,>= etc. I need them to support these, since I use it in LRTA function (here I attached a dummy implementation).
Second, I just tried it (with the dummy) and the same problem persists.

Should I return to PyMC2? everything works there fine.
I just needed the LOOCV implementation and PyMC2 doesn’t have it. But it seems that it might be easier to just implement that statistics myself with PyMC 2 than to get this to work in 3.
This seems to me extremely basic functionality, I just want to compare two ints that also support ordering

aseyboldt · July 19, 2017, 1:46pm

Theano does support comparisons. I don’t understand what you are trying to do at the moment, so it’s hard to tell you what’s going wrong exactly.
The only thing you can’t do as you normally would in python is use the result of a comparison in an if statement or loop. The result of a comparison is still a symbolic value, whether or not it is true depends on the input, and the symbolic value itself always evaluates to True by the normal python rules for bool(). You can get around that by using tt.switch. Maybe have a look at the theano intro: https://github.com/pymc-devs/pymc3/blob/master/docs/source/theano.rst

junpenglao · July 19, 2017, 2:17pm

Oh I see, how about something like the below implementation (1, removing the @as_op in the function def make_model_path_length(i):, 2 do r==sr.eval()):

def make_fCalc(gF=1,hF=1,gAddition=1):
    def tmp_f(g,h,s):
        return (gF*(g+gAddition)+hF*h,g+gAddition,hF,s)
    tmp_f.__name__='{0}*(g+{1})+{2}h'.format(gF,gAddition,hF)
    return tmp_f


def LRTA(start,heur=lambda x:0,calcF=make_fCalc(),is_stop=lambda x:False,update_h=True,hcache={},iters=1):
    return [1 for _ in range(9)],[]



class IVS(Discrete):
    def __init__(self,instances,h_epsilon,learning_iter, *args, **kwargs):
        super(IVS, self).__init__(*args, **kwargs)#What does discrete expects?
        self.instances = instances
        self.h_epsilon = h_epsilon
        self.learning_iter = learning_iter

    def logp(self, value):
        return logp(self.instances,self.h_epsilon,self.learning_iter,value)

def make_model_path_length(i):
    def model_path_length(h_epsilon,learning_iter):
        path,_= LRTA(i,heur=lambda x: (1+h_epsilon)*len(x),update_h=True,iters=learning_iter)
        print(len(path))
        return len(path)
    return model_path_length

# TODO: store per trial 
def logp(instances,h_epsilon,learning_iter,sub_path_length):
    max_trials=20
    lp=0
    trials=1;
    for i,sr in zip(instances,sub_path_length):
        m_path_f = make_model_path_length(i)
        while True:
            r=m_path_f(h_epsilon,learning_iter)
            print('Model:{} Subject:{}'.format(r,sr))
            if  r == sr.eval() or trials > max_trials:
                print(trials)
                break
            trials+=1
        lp+=sum([(1./(t+1)) for t in range(trials)])
        trials=1
    return -lp


lrta_model=Model()
with lrta_model:
    learning_iter=DiscreteUniform('learning_iter',lower=1, upper=5)
    h_epsilon = Uniform('h_epsilon',lower=0.,upper=1.)
    instances=np.array([0])
    path_length=IVS('path_length',instances,h_epsilon,learning_iter, observed=[9])
    tr=pm.sample(1000)

Well in your case it might indeed easier to do it in PyMC2, as there are only 2 1D RVs (and one being discrete). But if you can provide a bit more background information about the model and implementation I believe there must be ways to port it into PyMC3

zahybnaya · July 19, 2017, 10:44pm

Thanks. I honestly believe it does support comparisons, however it does not work (code is attached above, and unless I am using it wrong in the code, which is very possible, it seems that PyMC3 is not the right tool here, because this turns out to be way over complicated for a simple ==).

Why do you need to see the actual implementation? It’s irrelevant to the comparison. This is just pure python that does “whatever” with the input parameters and outputs a list, which its length is the value of the r variable.

I have tried using tt.switch and tt.ifelse. Both did not help here since both return…again… a Theano variable.
I appreciate your help.

aseyboldt · July 19, 2017, 11:17pm

I can’t help you with theano without knowing more about what you are doing. Inferring what code that doesn’t work is supposed to do isn’t always easy, and from the looks of it I guess you are a bit confused about how this works. But if you have only a couple of variables, then you probably don’t need hamiltonian methods or variational inference; plain simple Metropolis should be fine. And for metropolis we don’t need to compute gradients, which means we can avoid most of theano.
Just add an tt.as_op decorator to your logp function (and specify the correct dtypes in there). That way you get plain numpy arrays in logp instead of theano variables. And with those you can do whatever you like (including loops and ordinary comparisons)

zahybnaya · July 20, 2017, 2:41pm

Perfect! This was what I needed.
I didn’t look into Theano because I understood I don’t need it.
But it wasn’t clear from what I’ve read, that you can use PyMC3 without it.

Topic		Replies	Views
Boolean operations support for theano.tensor Questions theano	1	2420	July 13, 2017
Boolean expression in a while loop with theano tensor Questions theano	1	568	November 16, 2021
Defining a numeric (custom) likelihood function in PyMC3 Questions theano	2	3059	August 25, 2018
pm.Potential without Theano? Questions	14	2254	March 20, 2018
Help with pymc3 theano calculation/optimization Questions theano	0	450	February 26, 2019

Comparing Theano values

Related topics