I am using state space models for time series modeling using the following model
where i corresponds to ith sample at time t, g and f are non-linear models and \sigma is part of parameters to be estimated along with f and g. y^i(t) is a scaler observed value and for each i and t, there corresponds a feature vector z^i(t). The goal is to estimate g, f and \sigma. Note that for each time point t there are 400k samples, and there are total 100 time points.
For modeling state space I am using scan function
x0_ar = pm.Normal("xo_ar", 0, sigma=1, initval = init_ar, shape=(latent_factors), rng=rng) sigmas_Q_ar = pm.InverseGamma('sigmas_Q_ar', alpha=3,beta=0.5, shape= (latent_factors), rng=rng) Q_ar = pt.diag(sigmas_Q_ar) def step_simple(x, A, Q, bais_latent): innov = pm.MvNormal.dist(mu=0.0, tau=Q) next_x = g(x) + innov return next_x, collect_default_updates( [next_x]) ar_states_pt, ar_updates = pytensor.scan(step_simple, outputs_info=[x0_ar], non_sequences=[A_ar, Q_ar, bais_latent], n_steps=T, strict=True) assert(ar_updates) model_minibatch.register_rv(ar_states_pt, name='ar_states_pt', initval=pt.zeros((T, latent_factors))) ar_states = pm.Deterministic("ar", pt.concatenate([x0_ar.reshape((1,latent_factors)), ar_states_pt], axis=0)) obs = pm.Poisson('obs', f(ar_states,z), observed=data_out, total_size=424560)
ar_states goes as input to f function. When I run the inference using ADVI, the estimates of ar_states do not follow the state space dynamics and they just fit the training data in whatever way. The reason being that ar_states should maximize the two likelihoods, one is for the observed data and on is for the state space dynamics. Due to large number of samples (>400k), likelihood for observed data just dominates the state space dynamics. Is there a way to extract conditional log probability of states and weigh them such that optimizer equally maximize the two log probabilities?