# Running models "backwards" to get predictions

Forgive me if these are dumb questions - I’m still learning. Here I’m modeling the data generating process for a disease.

In my dataset, I have health metrics (just 1 here for simplicity) and the disease state (true or false). If a person (sample) has the disease, the health metric changes. Otherwise the metric stays at normal levels:

``````with pm.Model() as model:
p_disease = pm.Uniform('p_disease', lower=0, upper=1)
disease = pm.Bernoulli('disease', p=p_disease, observed=has_disease)

μ_metric1 = pm.Normal('μ_metric1', 0, sigma=10)
ρ_metric1 = pm.Normal('ρ_metric1', 1.5, sigma=2)
σ_metric1 = pm.HalfCauchy('σ_metric1', 1)
data_metric1 = pm.Data('data_metric1', X_train)
metric1 = pm.Normal('metric1',
# If there is disease, the metric changes. If not, the metric is at normal levels.
tt.switch(tt.eq(disease,1),ρ_metric1*μ_metric1, μ_metric1),
σ_metric1,
observed=data_metric1)

trace = pm.sample()
``````

I have a few questions in order of priority:

1. Is it possible to predict whether someone has a disease (binary classification) using this model, given health metric data? Am I formulating the wrong model? Note: I have another model that uses logistic regression that works correctly, but I’m curious about this particular model.
2. If I put the observed variable `has_disease` into its own `pm.Data()`, I run into optimization issues. What’s going wrong?
3. Is there a way to handle missing data and still get predictions? `pm.Data()` doesn’t like it, and I use it during the prediction step, which looks like this:
``````with model:
pm.set_data({"data_metric1": X_test})
predictions = pm.sample_posterior_predictive(trace)
``````