# How to apply signal filtering in PyMC?

I was using PyMC and try to follow blog by Dr. Juan Camilo Orduz, and a blog by Dr. Robert Kübler. In the blog he use ` GaussianRandomWalk` to model time varying coefficient. However when I try it on my own dataset, the random walk coefficients overfit, and cause other prior not to converge (`r_hat > 2`). So instead of modeling time-varying coefficient as pure `GaussianRandomWalk`, I think transforming the input using signal filtering method like `savgol_filter` would be nice

However if I apply the `savgol_filter` directly it give me

``````ValueError: setting an array element with a sequence.
``````

which is somewhat expected as the input variable is a pytensor, but `savgol_filter` need a numpy input

Another way I can think of is to apply `convolution` to the signal, however the document seems to be missing. I opened an issue here just in case

I’ve tried implement convolution operation, but my code is very slow

``````def pt_conv1d(a, v, sig_sz=None, mode="full"):
# Based on https://stackoverflow.com/questions/71309358/what-is-the-best-way-to-implement-1d-convolution-in-python
assert mode in ["full", "valid", "same"]
kernel = v[::-1]

if sig_sz is None:
sig_sz = a.shape
ker_sz = v.shape

if len(kernel) == 1:
mode = "full"
if mode == "full":
min_idx = 0
max_idx = sig_sz + ker_sz - 1
elif mode == "valid":
min_idx = ker_sz - 1
max_idx = -1*(ker_sz - 1)
else:
min_idx = ker_sz//2 - 1
max_idx = sig_sz + ker_sz - min_idx - 2

return pt.as_tensor_variable([
pm.math.dot(
a[max(0, i): min(i+ker_sz, sig_sz)],
kernel[max(-i,0): sig_sz-i*(sig_sz-ker_sz<i)],
) for i in range(1 - ker_sz, sig_sz)
][min_idx: max_idx])
``````

So is there a recommend way to apply signal filtering?

You can do 1d convolution using `conv2d`, you just have to add a dummy dimension then remove it when you’re done. I don’t have any PyMC or PyTensor examples of this, but I found this discussion about 1d convolution from the Theano team, which linked to these examples in the Lasagna library. Everything should work exactly the same (this part of the code base hasn’t changed since back then afaik), you’ll just have to rename some imports.

The convolution library is a bit of an orphan from the days when Theano was all about neural networks. There used to be comprehensive docs for it but they seem to have gotten lost in the shuffle. I found them on the wayback machine, though, along with the API docs here. Maybe you could rescue it @jessegrabowski

I’ve tried it and it work! If I use `pymc` sampler it is okay, but if I use a `numpyro` sampler it error out

``````NotImplementedError: No JAX conversion for the given `Op`: AbstractConv2d{convdim=2, border_mode=(0, (2, 1)), subsample=(1, 1), filter_flip=True, imshp=(None, None, None, None), kshp=(None, None, None, None), filter_dilation=(1, 1), num_groups=1, unshared=False}
``````

I think this is related to No JAX conversion for aesara convolution. But for now, using `pymc` sampler is good enough

Also when I tested on `pymc=5.1.2, python=3.11` it run just fine, but `pymc=5.2.0, python=3.10` never pass the sampling stage (just hang)

Sanity test code

``````import pytensor.tensor as pt
import pytensor.tensor.conv
import numpy as np

arr = np.array([1, 1, 2, 2, 1])

for ker in [[1, 1, 1, 3], [1, 1, 1, 3, 1]]:
ker = np.array(ker)
pt_arr, pt_ker = pt.as_tensor(arr), pt.as_tensor(ker)

print(np.convolve(arr, ker, mode="same"))
print(
pt.reshape(
pt.conv.conv2d(
pt_arr.reshape((1, 1, 1, pt_arr.shape)),
pt_ker.reshape((1, 1, 1, pt_ker.shape)),
border_mode=((0, 0), (ker.shape//2, ker.shape//2 - ((ker.shape + 1)%2)))
), (pt_arr.shape,)
).eval()
)
print(np.convolve(arr, ker, mode="valid"))
print(
pt.reshape(
pt.conv.conv2d(
pt_arr.reshape((1, 1, 1, pt_arr.shape)),
pt_ker.reshape((1, 1, 1, pt_ker.shape)),
border_mode="valid"
), (pt_arr.shape - (pt_ker.shape - 1),)
).eval()
)
print(np.convolve(arr, ker, mode="full"))
print(
pt.reshape(
pt.conv.conv2d(
pt_arr.reshape((1, 1, 1, pt_arr.shape)),
pt_ker.reshape((1, 1, 1, pt_ker.shape)),
border_mode="full"
), (pt_arr.shape + (pt_ker.shape - 1),)
).eval()
)
``````

## Update

I port code of `savitzky_golay` to work with PyTensor

``````def pt_savitzky_golay(x, window_size: int, order: int, deriv: int=0, rate=1):
# Assume x.shape >= window_sz

order_range = range(order+1)
half_window = (window_size -1) // 2

# precompute coefficients
b = np.mat([[k**i for i in order_range] for k in range(-half_window, half_window+1)])
m = np.linalg.pinv(b).A[deriv] * rate**deriv * np.math.factorial(deriv)

# pad the signal at the extremes with
# values taken from the signal itself
firstvals = x - pt.abs( x[1:half_window+1][::-1] - x )
lastvals = x[-1] + pt.abs( x[-half_window-1:-1][::-1] - x[-1] )

x = pt.concatenate((firstvals, x, lastvals))

return (
pt.reshape(
pt.conv.conv2d(
x.reshape((1, 1, 1, x.shape)),
m[::-1].reshape((1, 1, 1, m[::-1].shape)),
border_mode="valid"
), (x.shape - m[::-1].shape + 1, )
)
)

xarr = pt.as_tensor(np.array([1, 2, 3, 10, 3, 2, 4, 1, 3, 4, 0], dtype=np.float64))

print(savitzky_golay(xarr.eval(), 5, 2)) # From https://gist.github.com/krvajal/1ca6adc7c8ed50f5315fee687d57c3eb
print(pt_savitzky_golay(xarr, 5, 2).eval())
``````
1 Like

You could also try compiling it to rust with `nutpie` and see if that works.

Tried that and got the following error (`nutpie 0.5.0` via PyPi)

``````AttributeError: 'Model' object has no attribute 'n_dim'
``````

Maybe I will look into it later, thanks

1 Like

Adding (length safe) Exponential Weighted Moving Average based on this answer to have same functionality as `pandas.DataFrame.ewm`

Also this one support `numpyro` sampler

``````import numpy as np
import pytensor.tensor as pt
import matplotlib.pyplot as plt

def get_max_row_size(alpha, dtype=float):
assert 0. <= alpha < 1.
epsilon = np.finfo(dtype).tiny
return int(np.log(epsilon)/np.log(1-alpha)) + 1

def pt_ewma(data, alpha, shape: int, max_chunk_size: int):
# Based on: https://stackoverflow.com/questions/42869495/numpy-version-of-exponential-weighted-moving-average-equivalent-to-pandas-ewm

def pt_ewma_func(x, offset):
scaling_factors = (1 - alpha)**(pt.arange(x.shape + 1))
out = (x * (alpha * scaling_factors[-2]) / scaling_factors[:-1]).cumsum()
out /= scaling_factors[-2::-1]
out += offset * scaling_factors[1:]
return out

n_chunks = shape // max_chunk_size
n_trails = shape % max_chunk_size

if n_chunks == 0:
return pt_ewma_func(data, offset=data)

scaling_factors = (1 - alpha) ** pt.arange(1, max_chunk_size + 1)
last_scaling_factor = scaling_factors[-1]

out_main = pt.as_tensor_variable([
pt_ewma_func(data[ i*max_chunk_sz: (i + 1)*max_chunk_sz ], offset=0) for i in np.arange(n_chunks)
])
offsets = [data]
for i in np.arange(1, n_chunks):
offsets.append(offsets[i - 1] * last_scaling_factor + out_main[i - 1, -1])
offsets = pt.as_tensor_variable(offsets)
if n_chunks > 0:
out_main += offsets.reshape((n_chunks, 1)) * scaling_factors.reshape((1, max_chunk_size))

out_tail =  [] if n_trails == 0 else pt_ewma_func(data[-n_trails:], offset=out_main[-1, -1])
return pt.concatenate([out_main.reshape((n_chunks * max_chunk_size, )), out_tail])

window_sz = 30
alpha = 2/(window_sz + 1.0)
max_chunk_sz = get_max_row_size(alpha, np.float32)

np.random.seed(1337)
grw = np.cumsum(np.random.normal(size=max_chunk_sz + 10))
pt_grw = pt.as_tensor_variable(grw)

plt.figure(figsize=(15, 5))
plt.plot(grw)
plt.plot(pt_ewma(pt_grw, alpha, shape=grw.shape, max_chunk_size=max_chunk_sz).eval())
plt.show()
``````

And median filtering - model after `scipy.signal.medfilt`. This version will have boundary artifacts, but I’m not sure how to do it fast (`numpyro` not supported)

``````def pt_medfilt(arr, kernel_sz, fill_value=0):
if kernel_sz % 2 == 0:
med_arr = pt.stack([
*[pt.concatenate([arr[i:], (fill_value*pt.ones(i))]) for i in range(kernel_sz // 2, 0, -1)],
*[pt.concatenate([(fill_value*pt.ones(i)),arr[:-i]]) for i in range(1, kernel_sz // 2 + 1)],
]).sort(axis=0)
idx = (kernel_sz // 2)
return (med_arr[idx, :] + med_arr[idx + 1, :])/2
else:
med_arr = pt.stack([
*[pt.concatenate([arr[i:], (fill_value*pt.ones(i))]) for i in range(kernel_sz // 2, 0, -1)],
arr,
*[pt.concatenate([(fill_value*pt.ones(i)),arr[:-i]]) for i in range(1, kernel_sz // 2 + 1)],
]).sort(axis=0)
idx = (kernel_sz // 2)
return med_arr[idx, :]

window_sz = 30

np.random.seed(1337)
grw = np.cumsum(np.random.normal(size=1000))
pt_grw = pt.as_tensor_variable(grw)

plt.figure(figsize=(15, 5))
plt.plot(grw)
plt.plot(pt_medfilt(pt_grw, window_sz).eval())
plt.show()
``````