I am using state space models for time series modeling using the following model

where i corresponds to ith sample at time t, g and f are non-linear models and \sigma is part of parameters to be estimated along with f and g. y^i(t) is a scaler observed value and for each i and t, there corresponds a feature vector z^i(t). The goal is to estimate g, f and \sigma. Note that for each time point t there are 400k samples, and there are total 100 time points.

For modeling state space I am using scan function

```
x0_ar = pm.Normal("xo_ar", 0, sigma=1, initval = init_ar, shape=(latent_factors), rng=rng)
sigmas_Q_ar = pm.InverseGamma('sigmas_Q_ar', alpha=3,beta=0.5, shape= (latent_factors), rng=rng)
Q_ar = pt.diag(sigmas_Q_ar)
def step_simple(x, A, Q, bais_latent):
innov = pm.MvNormal.dist(mu=0.0, tau=Q)
next_x = g(x) + innov
return next_x, collect_default_updates( [next_x])
ar_states_pt, ar_updates = pytensor.scan(step_simple,
outputs_info=[x0_ar],
non_sequences=[A_ar, Q_ar, bais_latent],
n_steps=T,
strict=True)
assert(ar_updates)
model_minibatch.register_rv(ar_states_pt, name='ar_states_pt', initval=pt.zeros((T, latent_factors)))
ar_states = pm.Deterministic("ar", pt.concatenate([x0_ar.reshape((1,latent_factors)), ar_states_pt], axis=0))
obs = pm.Poisson('obs', f(ar_states,z), observed=data_out, total_size=424560)
```

ar_states goes as input to f function. When I run the inference using ADVI, the estimates of ar_states do not follow the state space dynamics and they just fit the training data in whatever way. The reason being that ar_states should maximize the two likelihoods, one is for the observed data and on is for the state space dynamics. Due to large number of samples (>400k), likelihood for observed data just dominates the state space dynamics. Is there a way to extract conditional log probability of states and weigh them such that optimizer equally maximize the two log probabilities?