Ok, thanks for all the input. Seems to be harder than I initally expected. What If I simplify the problem. Lets say I have a reference day and the corresponding value at that day (V0). Now I pick 6 measurement days before and after it. DB1 - DB6 could be the days before the reference day, and D1 - D6 the days after it (all given as distance to the reference day). VB1-VB6 and V1-V6 contain the corresponding values.
Couldnt I setup a model using V1-V6 as response variables and the others as predictor? If yes, couldnt I do an out-of-sample prediction for V1-V6 for a new timeseries ?
With ChatGPT I created sample data for that:
import numpy as np
import pandas as pd
import random
np.random.seed(42)
def simulate_measurement():
# Simulate the time series over 2 years (720 days)
num_days = 720
initial_score = random.uniform(40, 48)
decline_rate_per_day = random.uniform(0.5 / 30, 1.5 / 30) # Per day decline
days = np.sort(np.random.choice(range(20, 721), num_days // 10, replace=False)) # Random measurements every ~20-50 days
scores = initial_score - decline_rate_per_day * days
ref_index = random.randint(6, len(days) - 7) # Ensure we have 6 measurements before and after
ref_day = days[ref_index]
ref_value = scores[ref_index]
# Pick the 6 direct measurements before and after the reference day
indices_before = list(range(ref_index - 6, ref_index))[::-1]
indices_after = list(range(ref_index + 1, ref_index + 7))
db = [ref_day - days[i] for i in indices_before]
vb = [scores[i] for i in indices_before]
d = [days[i] - ref_day for i in indices_after]
v = [scores[i] for i in indices_after]
return {
'V0': ref_value,
'DB1': db[0], 'VB1': vb[0],
'DB2': db[1], 'VB2': vb[1],
'DB3': db[2], 'VB3': vb[2],
'DB4': db[3], 'VB4': vb[3],
'DB5': db[4], 'VB5': vb[4],
'DB6': db[5], 'VB6': vb[5],
'D1': d[0], 'V1': v[0],
'D2': d[1], 'V2': v[1],
'D3': d[2], 'V3': v[2],
'D4': d[3], 'V4': v[3],
'D5': d[4], 'V5': v[4],
'D6': d[5], 'V6': v[5],
}
data_list = [simulate_measurement() for _ in range(100)]
df = pd.DataFrame(data_list)
# Display the resulting DataFrame
print(df)
``