How to deal with a variable size of the 'observed' variable

Hello,
In my model, the parameter I am trying to evaluate influences the size (length) of the array of the observed variable. Think of it as a parameter that makes a time-dependent function reach a goal faster or slower depending on that parameter, hence the results length vary a little.

As I was mentioning in my other question in my test with fake data, I have a parameter whose real values is 6000, and so I start with a Normal distribution at 5500, with an sd=1000. Thanks to @twiecki I have put pm.Metropolis(scaling=100) and this helps in searching the domain, to hopefully reach the 6000 value.

Nevertheless, when the Metropolis sampler starts, my observed variable Y has a size of 1186 (corresponding to the parameterā€™s value of 6000). Now when I try different values of this parameter, the size of the observation changes slightly, going to 1159 for instance, and then at the next trying point, 1164 for instance. At the second step I get the a Theano error about dimensions mismatch: ValueError: Input dimension mis-match. (input[3].shape[0] = 1159, input[4].shape[0] = 1164)

Here is how my code is setup, I tried to do the following trick at the end in my @as_op function proc_test(alpha):

    length_obs=len(obs)
    return(np.asarray(obs[:min(np.shape(Y)[1],length_obs)]))

and then the same trick for the likelihood:

# Priors for unknown model parameters
alpha = pm.Normal('alpha', mu=5500, sd=1000) #real is 6000
sigma = pm.HalfNormal('sigma', sd=1000)

# Expected value of outcome
test = proc_test(alpha)

# Likelihood of observations
Y_obs = pm.Normal('Y_obs', mu=test, sd=sigma, observed=Y[0,:min(np.shape(Y)[1],length_obs)])

# Inference
step = pm.Metropolis(scaling=100)
start={'alpha':5500, 'sigma':1}
trace = pm.sample(100,step=step,start=start)

but as I said it returns an error after the second step. I checked with the debugger and I feel like once length_obs is fixed in the likelihood definition (by the start value, at 1159 in my case), it is not really changed to 1164 like in a loop for instance.

Is there a way to deal with this kind of situation? I could not find anything so far.
Thanks!

Bumping my own thread, since it did not get a lot of views and no answers so far.
Is there anything I can add to give more details maybe?
I cannot believe Iā€™m the only one facing this difficulty, or I must have a deep misunderstanding of how things should work, in which case Iā€™ll be happy to look at a new approach to the problem :slight_smile:

What do you mean by ā€œthe size of the observation changesā€? Iā€™d have said the observation is an observation, it canā€™t change. :slight_smile:
Could you maybe talk a bit about what you are modeling and post the whole code?

Hello,

Unfortunately I canā€™t post the model and/or code, otherwise I would have done it already, and my problem would probably have been resolved :smiley:

As I mentioned, I have an external model of a given system (in C), which I am trying to invert, i.e. do parameter estimation on it. The parameter I am interested in at first is alpha.
To that end I make a call to this model in an as_op function proc_test(alpha). This function which takes the parameter of interest alpha, and produces an output that is an array (that is my observation). The observation is a time series, and therefore has a certain length. This length, corresponding to the duration of a phenomenon, is not fixed, but depends a little bit on alpha. This is why I have a variable-size observation.

What I am trying to do, and I am sure it can be done, it to use the Metropolis sampler to estimate the parameter alpha. The way I am doing it is maybe not how it should be done though, but I am trying.

I now realize I did not really copy-paste well from my code to the first post, so there was an error. Here is how I do things currently, and more details :

# my fake data (observation) is stored in a variable Y
# it has a fixed size (=1186 here) corresponding to a value of alpha=6000, the value I am trying to find with Metropolis

...

# function that calls my external model and returns the observation
@theano.compile.ops.as_op(itypes=[t.dscalar],otypes=[t.dvector])
def proc_test(alpha):
    
    ...

    # call of my external model
    subprocess.Popen("MYCODE.exe", shell=True).wait()

    #then I read the output from a file which fills 'obs'

    length_obs=len(obs)
    return(np.asarray(obs[:min(np.shape(Y)[1],length_obs)])) #my try so far at dealing with the length problem

# Estimation
with pm.Model() as model:
    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=5500, sd=1000) #real is 6000
    sigma = pm.HalfNormal('sigma', sd=1000)

    # Expected value of outcome
    test = proc_test(alpha)

    # Likelihood of observations
    Y_obs = pm.Normal('Y_obs', mu=test, sd=sigma, observed=Y[0,:min(np.shape(Y)[1],length_obs)]) #same trick as above for the length

    # Inference
    step = pm.Metropolis(scaling=100)
    start={'alpha':5500, 'sigma':1}
    trace = pm.sample(100,step=step,start=start)

Then it returns an error after the second proc_test evaluation (I have a counter). I checked with the debugger and I feel like once length_obs is fixed in the likelihood definition (by the start value, at 1159 in my case, for alpha=5500), it is not really changed afterwards to match the new length value generated by the new alpha that Metropolis tries (which is neither 5500 nor 6000, but somewhere close to that) and for instance it can happen that the length of observations is 1164. I then get ValueError: Input dimension mis-match. (input[3].shape[0] = 1159, input[4].shape[0] = 1164).
It seems that my trick doesnā€™t work because once length_obs is fixed in the likelihood definition, this definition is not updated like in a loop.

So I was wondering what kind of trick I could use to solve that?

Thanks in advance!

Ah, I think I get it now, thanks. The observation is Y, the size of the prediction is changing, right?
I think the easiest solution would be to move the likelihood into the proc_test computation. Something along the lines of this:

def proc_test(alpha, sigma):
    # run the external program and write results into `predicted`
    ...
    observed_ = Y[:, min(Y.shape[1], len(predicted))]
    diff = (observed_ - predicted) / sigma
    return -0.5 * (diff * diff + np.log(2*np.pi) * sigma)

Then, you can use pm.DensityDist to get the likelihood into the model.

Exactly, thanks for the suggestion!

I input what you proposed in my proc_test function, and now Iā€™m putting this in my with pm.Model() as model:

    # Expected value of outcome
    test = proc_test(alpha, sigma)

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.DensityDist('Y_obs', logp=test, observed=Y)

But I get TypeError: 'TensorVariable' object is not callable.
I tried different syntaxes such as directly logp=proc_test or logp=proc_test(alpha,sigma) but I always get the same error. Any ideas? It feels weird to put observed=Y since weā€™re now looking at a sub-sample of Y in some cases for instance, but Iā€™m not sure.
I have to admit that I am not familiar yet with DensityDist also.

You might find this post helpful where I was trying to explain the mechanism of DensityDist.

Thanks,

I used the dictionary syntax for observed, and I have this slightly modified code now :

@theano.compile.ops.as_op(itypes=[t.dscalar,t.dscalar],otypes=[t.dvector])
def proc_test(Y,alpha,sigma):
    
    ...
    
    predicted= ... #what I had defined as obs before
    observed_ = Y[:, min(Y.shape[1], len(predicted))]
    diff = (observed_ - predicted) / sigma
    return -0.5 * (diff * diff + np.log(2*np.pi) * sigma)
  
with pm.Model() as model:

    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=5500, sd=1000) #real is 6000
    sigma = pm.HalfNormal('sigma', sd=1000)

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.DensityDist('Y_obs', proc_test, observed={'Y':Y,'alpha':alpha,'sigma':sigma})

    # Inference!
    step = pm.Metropolis(scaling=100)

    start={'alpha':5500, 'sigma':1}

    trace = pm.sample(100,step=step,start=start)

But I get now a TypeError: make_node() got an unexpected keyword argument 'Y'. Not sure what is wrong again here, because it looks like I have the same syntax at all steps as the code in the thread you linked.

You need to specify the input for Y as well:

@theano.compile.ops.as_op(itypes=[t.dvector,t.dscalar,t.dscalar],otypes=[t.dvector])

Also, I think you should return a scaler instead of a vector:

    return np.sum(-0.5 * (diff * diff + np.log(2*np.pi) * sigma))

Hello,

Yes, clearly I needed to change that,
I now have @theano.compile.ops.as_op(itypes=[t.dvector,t.dscalar,t.dscalar],otypes=[t.dscalar])
nevertheless I still get the exact same error (which is why I oversaw this simple mistake)ā€¦

I did some tests, and it seems that the keyword argument in the error is the first one that is passed in observed. If I change the order of the dictionnary, by putting alpha first, it yields
TypeError: make_node() got an unexpected keyword argument 'alpha'

Going into the code of op.py it seems that the kwargs are maybe confusing make_node() here.
I tried to remove the as_op of my proc_test function, then it runs without this error, but instead of using the value of alpha it sees it as a string in the function, which of course produces errors later on.

Iā€™m not sure if Iā€™m using this DensityDist the right wayā€¦

Hello,

I looked at this make_node() error a little more, but could not find anything that would make it disappear.
I was wondering if anyone had more ideas?

Here is the code with the last remarks from @junpenglao

@theano.compile.ops.as_op(itypes=[t.dscalar,t.dscalar,t.dvector],otypes=[t.dscalar])
def proc_test(alpha,sigma,Y):
    
    ...
    
    predicted= ... #what I had defined as obs before
    observed_ = Y[:, min(Y.shape[1], len(predicted))]
    diff = (observed_ - predicted) / sigma
    return np.sum(-0.5 * (diff * diff + np.log(2*np.pi) * sigma))
  
with pm.Model() as model:

    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=5500, sd=1000) #real is 6000
    sigma = pm.HalfNormal('sigma', sd=1000)

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.DensityDist('Y_obs', proc_test, observed={'alpha':alpha,'Y':Y,'sigma':sigma})

    # Inference!
    step = pm.Metropolis(scaling=100)

    start={'alpha':5500, 'sigma':1}

    trace = pm.sample(100,step=step,start=start)

I still get TypeError: make_node() got an unexpected keyword argument 'alpha'.
And to make it clear, if I change the order of the terms in the DensityDist dictionnary parameters, putting for instance Y in first poistion, Iā€™ll get TypeError: make_node() got an unexpected keyword argument 'Y'.
Iā€™m really unsure what is going on here, but it must be some kind of syntax error.

Functions decorated with as_op canā€™t take keyword arguments. You could wrap it in a simple python function:

@theano...
def proc_test(alpha, sigma, Y):
    ...

with model:
    alpha = ...
    sigma = ...

    def logp(Y):
        return proc_test(alpha, sigma, Y)

    Y_obs = pm.DensityDist('Y_obs', logp, observed=Y)
1 Like

Thanks it worked!

Here are a few more changes that needed to be done too, if anyone is interested:

observed_ = Y[:min(Y.shape[0], len(predicted))]

since I noticed it was outputting one element instead of the array. I actually noticed my Y had an extra pair of , so I removed it, and then changed the code above which returns the needed array, and then

diff = (observed_ - predicted[:min(Y.shape[0], len(predicted))]) / sigma

to match the case where I have more predictions than in Y.

Afterwards, I came across some really weird NumPy errors at about 17% in the run: TypeError:

Cannot cast ufunc multiply output from (ā€˜float64ā€™) to dtype(ā€˜int32ā€™) with casting rule ā€˜same_kindā€™

so I had to change the scale function in metropolis.py this way (it didnā€™t like the *= apparently), but now it works:

# Switch statement
if acc_rate < 0.001:
    # reduce by 90 percent
    np.multiply(scale, 0.1, out=scale, casting='unsafe')
elif acc_rate < 0.05:
    # reduce by 50 percent
    np.multiply(scale, 0.5, out=scale, casting='unsafe')
elif acc_rate < 0.2:
    # reduce by ten percent
    np.multiply(scale, 0.9, out=scale, casting='unsafe')
elif acc_rate > 0.95:
    # increase by factor of ten
    np.multiply(scale, 10.0, out=scale, casting='unsafe')
elif acc_rate > 0.75:
    # increase by double
    np.multiply(scale, 2.0, out=scale, casting='unsafe')
elif acc_rate > 0.5:
    # increase by ten percent
    np.multiply(scale, 1.1, out=scale, casting='unsafe')

By the way, one last thing @aseyboldt , it seems to me that in your reply where you tell me to return the logp in the function directly, you write to put

return -0.5 * (diff * diff + np.log(2*np.pi) * sigma)

which @junpenglao changed into

return np.sum(-0.5 * (diff * diff + np.log(2*np.pi) * sigma))

but shouldnā€™t it be

return np.array(np.sum(-0.5 * (diff * diff + np.log(2np.pi) + 2np.log(sigma))))

instead? (nevermind the np.array that was also creating some bugs).
If you do the math for the log of the Normal pdf:

log[1 / (sigma sqrt(2pi)) * exp(-0.5 * (x-mu)^2 / sigma^2)] = -0.5 * [(x-mu)^2 / sigma^2 + log(2pi) +2*log(sigma)]

so the last term is not * sigma, but +2*log(sigma), no?

You are right about the logp of the normal, sorry about that. One would think that Iā€™ve used that enough times to get it rightā€¦

About the dtype issue: Could you try to change the scaling to 100.0 instead of 100? I think this a bug in pymc, where we donā€™t check that the scaling in a floating point number.

I will try that, revert my changes in metropolis.py, and report if there are any bugs, thanks for the suggestion!

I can now confirm that I had no problem with that during the whole run.

2 Likes