Classification with weighted samples: lda and synthetic data

timzen · January 3, 2023, 3:28pm

Continuing the discussion from How to run logistic regression with weighted samples:

After the problem in the linked thread was solved, there were several further questions. I think these questions should addressed on their own.

As I understand it generating synthetic data is not possible with the proposed logistic regression model. If this is desired I would rather use a generative model using linear discriminant analysis. Here is a example I came up with alone so far. I used the well-known iris dataset to reproduce my example easily:

df = (
    sns.load_dataset("iris")
    [lambda df: df.species.isin(("setosa", "versicolor"))]
    .assign(
        label = lambda df: pd.Categorical(df.species).codes
    )
)
input_cols = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
seed = 42

with pm.Model() as model:
    sigma = pm.HalfNormal("sigma", sigma=10, shape=len(input_cols))
    mu = pm.Normal("mu", mu=0, sigma=10, shape=(2,len(input_cols)))

    setosa = pm.Normal(
        "setosa", 
        mu=mu[0],
        sigma=sigma,
        observed=df[df.species=="setosa"][input_cols].to_numpy()
        )

    versicolor = pm.Normal(
        "versicolor", 
        mu=mu[1],
        sigma=sigma,
        observed=df[df.species=="versicolor"][input_cols].to_numpy()
        )
    trace = pm.sample(1000, random_seed=seed)
summary = az.summary(trace)
print(summary.to_markdown())

So here are my remaining questions:

Is this the correct idea to generate synthetic samples?
In the model here the gaussians for the different features are decoupled. How would a model using a multivariate (coupled) gaussian look like?
Is it possible to include weights for the samples here as well?

Topic		Replies	Views
How to run logistic regression with weighted samples Questions linear_model	10	3320	August 30, 2021
Weighted Linear Regression using GLM Questions	2	1211	August 30, 2021
Variational Inference: Bayesian Neural Networks, nonbinary classification v5	0	297	April 18, 2023
Categorical from factors Questions	1	412	July 14, 2020
Logistic Regression with a very sparse matrix Questions	2	921	November 5, 2019

Classification with weighted samples: lda and synthetic data

Related topics