Modelling a timeseries of cumulative maximum

I also noticed the math processing error went away when I refreshed.

I’ve spent some time thinking about this and I think I’ve settled on an answer I’m happy with.

  1. Is my math correct?

No, I don’t think it is. So the expression of the density of Y_n = \max \{ X_1, \ldots, X_n \} where the X_i are i.i.d. and X_1 has a pdf f_X and a cdf F_X is:

f_Y(y) = n[F_X(y)]^{n-1} f_X(y).

In order to derive this expression, you begin as you did where you write things out in terms of the CDF and do a bunch of manipulations. However, this only works in the 1D case from what I can tell. When you move to the joint CDF, things get complicated and it’s unclear to me how to proceed. Particularly, the hard part seems to be conditioning Y_n on Y_{n-1}. I eventually tried taking a different approach which I’ll detail below.

Disclosure: I’m no expert on order statistics so I may be entirely wrong

I think this problem is much mathematically harder than you may have anticipated. From properties of maximums we know that Y_{n+1} = \max \{ X_1, \ldots, X_n, X_{n+1} \} = \max \{Y_n, X_{n+1} \}. So, this would mean that P(Y_{n+1} \leq y | Y_1, \ldots, Y_n) = P(Y_{n+1} \leq y | Y_n), i.e. that this sequence Y_i is Markovian in some sense. The distribution of Y_{n+1} only depends on the previous Y_{n} and the value of X_{n+1}. Let Y_{1:i} denote the collection \{ Y_1, \ldots, Y_n \}. Then the likelihood for your time series of n observations becomes

p_{Y_{1:n}}(y_1, \ldots, y_n | \mu, \sigma) = \prod_{i=1}^{n-1} p_{ Y_{i+1}| Y_{1:i}}(y_{i+1} | y_1, \ldots, y_{i}, \mu, \sigma) \\ = \prod_{i=1}^{n-1} p_{ Y_{i+1}| Y_{i}}(y_{i+1} | y_{i}, \mu, \sigma).

Ok, so it looks like all we need is an expression for the density f_{Y_{n+1}| Y_{n}} of Y_{n+1} conditioned on an observation of Y_{n}.

Now we’re going to try a different way to figure out the cdf by reasoning about how could draw samples of the random variable Y_{n+1}|Y_{n} = y_n. When we go to simulate Y_{n+1}, we first draw a sample for X_{n+1}. Let’s call this sample x_{n+1}. Then, we compare whether the sample is bigger or less than our current maximum y_n. If it’s smaller, we then set y_{n+1} = y_{n}; otherwise we set y_{n+1} = x_{n+1}. In math notation this process can be written as:

  1. Flip a biased coin that yields heads with probability F_X (y_n) and tails with probability 1 - F_X(y_n).
  2. If the coin is heads, then set y_{n+1} = y_n. If the coin is tails, draw y_n \sim Z where Z is a random variable that admits a density
f_Z(z) = \frac{f_X(z)}{1 - F_X(y_n)} \mathbb{\chi}_{z > y_n}.
  1. Return the value for y_{n+1} as your sample of Y_{n+1}.

Intuitively, what we’ve done is make a new CDF F_{Y_{n+1}} where we take all of the probability mass (under F_{X}) for values less than y_n and concentrate it at the single point y_n. Then, the CDF continues as it would for F_X except scaled so that F_{Y_{n+1}}(\infty) = 1.

Here’s the cdf of your true RV X (normal with \mu = 10, \sigma = 1):

Here’s the cdf for Y_{n+1}, conditioned on an observation Y_{n} = 10:

So the random variable Y_{n+1} is a mixed random variable; it has both a continuous part and a discrete part. Unfortunately, this is where your approach breaks down: I think this breaks the statistical theory when trying to use the MCMC methods implemented in PyMC3 (Metropolis, NUTS, etc.). You could write out a density for the conditioned Y_{n+1} but this would involve Dirac delta functions.

IMO, if my reasoning above is correct then I think the best you could do using PyMC3 is to ignore the sequence of maximums and consider only the last one, Y_{N}. This throws out information but then all you need is a density for Y_{N} which you already know (see the formula above). You could then take the logarithm of this density and write a custom PyMC3 distribution that has this log density under its logpdf method. As long as you code this log density using functions in pm.math or other theano code then you get the gradient of the log density for free, then you’re all good to go to use NUTS.

Your final code would look like:

with pm.Model() as model:
  mu = pm.Normal('mu', mu=10, sigma=5)
  sigma = pm.Exponential('sigma', lam=1)

  # This shows the model only the last observation. N is the number of X_i that 
  # you've drawn leading up to Y_n, which will show up in your custom distribution definition
  likelihood = YourCustomDistribution("y_N", mu=mu, sigma=sigma, N=N, observed=obss[-1])

This is unsatisfying because an entire time series then only gives you a single observation. You could perhaps feed in the final values Y_N of an ensemble of time series generated the same way, but depending on your application you may not have access to this.

TL;DR I don’t think it is possible to approach this problem (in a satisfactory way) in PyMC3 since you end up doing MCMC on mixed continuous/discrete distributions.

Since I’m curious, where is the data you’re trying to model coming from?

1 Like