Multiple Latent Gaussian Processes - As Function Inputs - ValueError: array must not contain infs or NaNs

I am trying to fit a series of parameter inputs to a function where the inputs are able to change over time (and eventually space).
The items follow a trajectory which can be approximated by:

y_i(Q, q_i, d_i, b_i)=(q_i^{-b_i}*(Q*d_i*(b_i-1)+q_i))^{\frac{1}{(1-b_i)}}

Q is a quantity that varies for each item over time.

q_i, d_i, b_i

are assumed constant for each item.
Assuming each item i is pulled sequentially in time and some form of a gaussian process sets the
q_i, d_i, b_i for the item then:

q_i(t_i) \sim \mathcal{GP}(m_q(t_i),\, k_q(t_i, t_i')) \,\\ d_i(t_i) \sim \mathcal{GP}(m_d(t_i),\, k_d(t,_i t_i')) \, \\ b_i(t_i) \sim \mathcal{GP}(m_b(t_i),\, k_b(t_i, t_i')) \,

where t_i is the time that sample i was drawn.
As an example the following code creates a data set

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed=0)
n = 30 # number of samples

t_i = np.linspace(0,100, n)[:, None]  
# the samples are pulled at even intervals of time between t=0 and t=100

# create a generic function w.r.t. time
ℓ_true = 10.0
η_true = 3.0
cov_func = (η_true ** 2) * pm.gp.cov.Matern52(1, ℓ_true)
mean_func = pm.gp.mean.Zero()
f_true = rng.multivariate_normal(
    mean_func(t_i).eval(),
    cov_func(t_i).eval() +  1e-8*np.eye(n),
    1
).flatten()

# create the individual samples scaling the inputs as appropriate for my case
# give each parameter a little noise as well
q_is = 2000+f_true*100+rng.standard_t(10,size=n)*50
d_is = .03-f_true/500+rng.standard_t(10,size=n)*.002
b_is = .95+f_true/20+rng.standard_t(10, size=n)*.01

# the range of Q that samples are monitored through
Q = np.linspace(0,100000, 100)

def equation(Q, qi, di, bi):
    return (qi**(-bi)*(Q*di*(bi-1)+qi))**(1/(1-bi))

# Create simulation where each row is a sample of qi, di, bi 
# and each column is the value of the equation at the respective Q
sample_data = equation(
    np.tile(Q,(len(q_is),1)),
    np.tile(q_is,(len(Q),1)).T,
    np.tile(d_is, (len(Q),1)).T,
    np.tile(b_is, (len(Q), 1)).T
)

# separate the feature data (time in this case) for each item in a dataframe
feature_data = pd.DataFrame(
    np.hstack([np.arange(n)[:, None],t_i]),
    columns = ['sample_id', 'sample_init_time']
)

# create a dataframe identifying sample_id with the equation output and respective Q
# first flatten and expand the arrays appropriately

state = sample_data.ravel()
sample_id =np.tile(np.arange(n), (len(Q),1)).T.ravel()
Qs = np.tile(Q, (n,1)).ravel()

state_data = pd.DataFrame(
    np.array([sample_id, state, Qs]).T,
    columns = ['sample_id', 'state', 'Q']
)

# add noise to the true state
state_data["measured_values"] = state_data['state']+100*rng.standard_t(3, size=len(state_data))
# clip negative values because in this case they would not be possible
state_data["measured_values"] = state_data['measured_values']*(state_data['measured_values']>0)
state_data['sample_id'] = state_data['sample_id'].astype(int)

Not sure if that is clear but hopefully you can see that we take a sample and we watch the decay. Some time later we take a different sample and that decay profile might be similar to the previous decay profile or a least we would expect it to be.

Given that this process happens what are the best estimates of qi, di, bi for each sample. Now the model

with pm.Model() as model:
    # I start off with a lot of the variables appropriately scaled
    # so that the sampler is likely drawing from random variables that are within
    # a couple of orders of magnitude between the largest and smallest variables.
    # I have read that can help
    q_scaled = pm.HalfCauchy("qi_scaled", 2)
    d_scaled = pm.HalfCauchy("d_scaled", 2)
    b_scaled = pm.Beta("b_scaled", alpha=1, beta=1, testval=.65)  
    # have to set a test value different than mean because function is undefined at b=1
    # when this is rescaled by multiplying by 2 the mean would be 1 without that
    
    ls = pm.InverseGamma("ls", 1, 1)
    q_η = pm.HalfCauchy("q_η", 1)
    d_η = pm.HalfCauchy("d_η", 1)
    b_η = pm.HalfCauchy("b_η", 1)

    q_cov = (q_η**2)*pm.gp.cov.ExpQuad(1, ls)
    q_gp = pm.gp.Latent(cov_func=q_cov)
    q_t = q_gp.prior("q_t", X= feature_data.sample_init_time.values[:,None])
    
    d_cov = (d_η**2)*pm.gp.cov.ExpQuad(1, ls)
    d_gp = pm.gp.Latent(cov_func=d_cov)
    d_t = d_gp.prior("d_t", X= feature_data.sample_init_time.values[:,None])
    
    b_cov = (b_η**2)*pm.gp.cov.ExpQuad(1, ls)
    b_gp = pm.gp.Latent(cov_func=b_cov)
    b_t = b_gp.prior("b_t",X= feature_data.sample_init_time.values[:,None])
    
    qi = pm.Deterministic("qi", (q_scaled+q_t)*1000)
    di = pm.Deterministic("di", (d_scaled+d_t)/100)
    bi = pm.Deterministic("bi", (b_scaled+b_t)*2)
    
    q = qi[state_data.sample_id.values]
    d = di[state_data.sample_id.values]
    b = bi[state_data.sample_id.values]
    
    base = q**(-b)*(state_data.Q.values*d*(b-1)+q)
    exponent = (1/(1-b))
    
    # add potential where base goes negative because base**exponent is likely to be undefined
    undefined = pm.Potential("undefined", pm.Exponential.dist(1).logp(-base*(base<0)))
    
    base_ = base*(base>0)
    mu = base_**exponent
    
    ν = pm.InverseGamma("measurement_nu", 1,1)
    σ = pm.HalfCauchy("measurement_sigma_scaled", 1)
    
    obs = pm.StudentT(
    "observed",
    mu=mu,
    sigma=σ*100,
    nu=ν,
    observed =state_data.measured_values.values
)

and then trying to sample

with model:
    trace=pm.sample(tune=1000, draws=1000, return_inferencedata=True)

I get the failure

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [measurement_sigma_scaled, measurement_nu, b_t_rotated_, d_t_rotated_, q_t_rotated_, b_η, d_η, q_η, ls, b_scaled, d_scaled, qi_scaled]

 0.00% [0/8000 00:00<00:00 Sampling 4 chains, 0 divergences]
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\compile\function\types.py", line 974, in __call__
    self.fn()
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\graph\op.py", line 476, in rval
    r = p(n, [x[0] for x in i], o)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\tensor\slinalg.py", line 253, in perform
    rval = scipy.linalg.solve_triangular(A, b, lower=False)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\scipy\linalg\basic.py", line 334, in solve_triangular
    b1 = _asarray_validated(b, check_finite=check_finite)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\scipy\_lib\_util.py", line 262, in _asarray_validated
    a = toarray(a)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\numpy\lib\function_base.py", line 488, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\pymc3\parallel_sampling.py", line 137, in run
    self._start_loop()
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\pymc3\parallel_sampling.py", line 191, in _start_loop
    point, stats = self._compute_point()
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\pymc3\parallel_sampling.py", line 216, in _compute_point
    point, stats = self._step_method.step(self._point)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\pymc3\step_methods\arraystep.py", line 276, in step
    apoint, stats = self.astep(array)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\pymc3\step_methods\hmc\base_hmc.py", line 139, in astep
    start = self.integrator.compute_state(q0, p0)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\pymc3\step_methods\hmc\integration.py", line 44, in compute_state
    logp, dlogp = self._logp_dlogp_func(q)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\pymc3\model.py", line 734, in __call__
    output = self._theano_function(array)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\compile\function\types.py", line 987, in __call__
    raise_with_op(
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\link\utils.py", line 508, in raise_with_op
    raise exc_value.with_traceback(exc_trace)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\compile\function\types.py", line 974, in __call__
    self.fn()
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\graph\op.py", line 476, in rval
    r = p(n, [x[0] for x in i], o)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\tensor\slinalg.py", line 253, in perform
    rval = scipy.linalg.solve_triangular(A, b, lower=False)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\scipy\linalg\basic.py", line 334, in solve_triangular
    b1 = _asarray_validated(b, check_finite=check_finite)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\scipy\_lib\_util.py", line 262, in _asarray_validated
    a = toarray(a)
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\numpy\lib\function_base.py", line 488, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs
Apply node that caused the error: Solve{A_structure='upper_triangular', lower=False, overwrite_A=False, overwrite_b=False}(InplaceDimShuffle{1,0}.0, Elemwise{Composite{((i0 * i1) - i2)}}[(0, 0)].0)
Toposort index: 237
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(30, 30), (30, 30)]
Inputs strides: [(240, 8), (240, 8)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[InplaceDimShuffle{1,0}(Solve{A_structure='upper_triangular', lower=False, overwrite_A=False, overwrite_b=False}.0)]]

Backtrace when the node is created(use Theano flag traceback__limit=N to make it longer):
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\gradient.py", line 1393, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\gradient.py", line 1061, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\gradient.py", line 1061, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\gradient.py", line 1393, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\gradient.py", line 1061, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\gradient.py", line 1061, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\gradient.py", line 1393, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "C:\Users\rlarson\Anaconda3\envs\complex_prod\lib\site-packages\theano\gradient.py", line 1220, in access_term_cache
    input_grads = node.op.L_op(inputs, node.outputs, new_output_grads)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
"""

I have tried this with a linear model and do not get this type of error. There is a little bug where b could be close to 1 causing \frac{1}{1-b} to be undefined but without any samples it is hard to tell if this is the spot that is causing the issue. If it is are there any ideas on how best to smooth over that spot?

Update: I used a cubic spline function to smooth over the discontinuity. The spline allows for both a smooth function and a smooth gradient with respect to b. I chose to use the spline for values of b

.999<b<1.001

The error still happens. On the other hand if I initialize with find_MAP it runs.

Thanks in advance.

I think the first thing to try is to fix all the GP hyperparameters, (lengthscales and etas) to reasonable values. Hopefully this prevents the immediate error. I would then gradually loosen up the priors until you start seeing the problem again, and hopefully this shows what part of the parameter space is causing problems. Likely you don’t want to put any prior mass there anyway. This also has the side benefit that your code should run a little quicker while developing your model.

What you are seeing could be happening because your priors on the hyperparameters are uninformative. GPs are so flexible on their own, that even with pretty informative priors they can model a very big space of functions, see here.

1 Like