# Classification with weighted samples: lda and synthetic data

Continuing the discussion from How to run logistic regression with weighted samples:

After the problem in the linked thread was solved, there were several further questions. I think these questions should addressed on their own.

As I understand it generating synthetic data is not possible with the proposed logistic regression model. If this is desired I would rather use a generative model using linear discriminant analysis. Here is a example I came up with alone so far. I used the well-known iris dataset to reproduce my example easily:

``````df = (
[lambda df: df.species.isin(("setosa", "versicolor"))]
.assign(
label = lambda df: pd.Categorical(df.species).codes
)
)
input_cols = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
seed = 42

with pm.Model() as model:
sigma = pm.HalfNormal("sigma", sigma=10, shape=len(input_cols))
mu = pm.Normal("mu", mu=0, sigma=10, shape=(2,len(input_cols)))

setosa = pm.Normal(
"setosa",
mu=mu,
sigma=sigma,
observed=df[df.species=="setosa"][input_cols].to_numpy()
)

versicolor = pm.Normal(
"versicolor",
mu=mu,
sigma=sigma,
observed=df[df.species=="versicolor"][input_cols].to_numpy()
)
trace = pm.sample(1000, random_seed=seed)
summary = az.summary(trace)
print(summary.to_markdown())
``````

So here are my remaining questions:

1. Is this the correct idea to generate synthetic samples?
2. In the model here the gaussians for the different features are decoupled. How would a model using a multivariate (coupled) gaussian look like?
3. Is it possible to include weights for the samples here as well?