Fit interval as a model parameter

edbennett · July 14, 2020, 4:08pm

Hi,

I have some data whose functional form I know, but only in some interval—outside the interval I do not have good control of the functional form, and neither do I a priori know what the interval is. Up until now I’ve been running a separate fit for each possible window of data, and then looking for plateaux/islands of stability, but the results of this process are hard to visualise, reason about, or derive a single final value from. I’d like to be able to define a single model that treats the fit region bounds as parameters on the same footing as the other parameters to the fit form.

I’ve put together a relatively minimal example of what I’m trying to achieve. Constructing some sample data as:

import matplotlib.pyplot as plt
import numpy as np

def weird_function(x, gate1, gate2, A, B):
    return (
        ((gate1 < x) & (x < gate2)) * (A * (x - gate1) + B)
        + (x < gate1) * B * (1 - np.sin((gate1 - x) * 3) / 10)
        + (x > gate2) * (B + (gate2 - gate1) * A + A / 5 * np.sin((x - gate2) * 10) - (x - gate2) ** 2 / 4)
    )

A = 2
B = 3
gate1 = 3
gate2 = 7
xdata = np.linspace(0, 10, 50)
ydata = weird_function(xdata, gate1, gate2, A, B)
yerror = np.random.random(len(xdata)) / 3 + 0.2

plt.errorbar(xdata, ydata, yerr=yerror, fmt='.')
plt.show()

I’d like the model to locate and fit parameters for the straight line region, giving back values for A, B, gate1, and gate2. (The real data I want to apply this to aren’t linear; this is a simplification for the sake of readability here.)

The model I’ve tried is:

import pymc3 as pm
import theano.tensor as tt
import theano

xtensor = theano.shared(xdata)
ytensor = theano.shared(ydata)
yerrtensor = theano.shared(yerror)

xmin = min(xdata)
xmax = max(xdata)
xspan = xmax - xmin

model = pm.Model()

with model:
    A = pm.Normal('A', mu=5, sigma=2)
    B = pm.Normal('B', mu=5, sigma=2)

    x_lower_bound = xmin + pm.Gamma('x_lower_bound', mu=xspan / 10, sigma=xspan / 2)
    x_upper_bound = xmax - pm.Gamma('x_upper_bound', mu=xspan / 10, sigma=xspan / 2)

    x_to_fit = pm.Deterministic('x_to_fit', xtensor[tt.gt(tt.lt(x_lower_bound, xtensor), x_upper_bound).nonzero()])
    y_to_fit = pm.Deterministic('y_to_fit', ytensor[tt.gt(tt.lt(x_lower_bound, xtensor), x_upper_bound).nonzero()])
    yerr_to_fit = pm.Deterministic('yerr_to_fit', yerrtensor[tt.gt(tt.lt(x_lower_bound, xtensor), x_upper_bound).nonzero()])

    y_form = pm.Deterministic('y_form', A * x_to_fit + B)
    y_obs = pm.Normal('y_obs', mu=y_form, sigma=yerr_to_fit, observed=y_to_fit)

    trace = pm.sample(draws=1000, tune=1000)

This completes on the sample data, but generates warnings and doesn’t give good fit results—the fit bounds seem to try and run away.

It fails on my real data, with the error “could not broadcast input array from shape (1840) into shape (2720)”, presumably because the fit region changing affects the number of points being fitted, which is assumed constant; I suspect this isn’t seen in the example case because the bound never gets to the point that the data change shape?

I think I’m doing this wrong, but can’t work out how to do it properly. Can anyone offer any suggestions, please?

Thanks!

DanWeitzenfeld · July 14, 2020, 8:23pm

I think you need to model it such that the points outside the bounds are drawn from a different distribution - they can’t just not contribute to the likelihood.

This is hacky but hopefully it can point you in the right direction. I modeled points outside the bounds as coming from a super wide normal.

gist.github.com

https://gist.github.com/DanielWeitzenfeld/3297bcb590998a126e3e4c632b783594

Weird Function.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]

This file has been truncated. show original

You might also check out https://docs.pymc.io/notebooks/dependent_density_regression.html

edbennett · July 21, 2020, 11:30am

That’s incredibly helpful, thank you!

Since my data are (very) non-linear (and have a relatively constant relative error), I think I need to adjust the sigma so that it scales with the data, so as not to bias the fit towards matching the higher data more closely. I think that in the definition of n, multiplying sigma by the same expression used fir mu may work for this; I’ll see what happens.

Thanks again!

Topic		Replies	Views
Non-centered hierarchical model with constraints Questions	12	2645	September 30, 2020
Beginning user, problem set correctly (experimental data fit)? v5 modeling	2	332	January 16, 2023
Censored pymc for power-law distribution of column density v5 modeling , sampling , arviz	10	368	February 21, 2024
Uniform distributions with variable width issue Questions	1	608	April 20, 2019
Modelling problem: y_est = (1-a)specarr_old + anp.squeeze(SSParr[np.argmin(abs(SSPage - b))]) Questions	0	387	September 4, 2019

Fit interval as a model parameter

Related topics