Multilinear regression with mixture of gaussians

david · July 24, 2024, 7:51am

Dear all, sofar I succesfully implented to my problem robust multilinear regression according your examples. I would like to kindly ask you if you can support me with simple example how to do a multilinear regression with mixture of gaussians. I am not statistician specialist therefore please accept my appology if I do not use exact terms. Please, find atached the link on thesis where this regression example is perfomed as example to understand what I would like to try with you kind support in pymc. Is it even possible to do that ? The attachment is attached in order to clarify what I would like to achieve but with simple example in pymc. https://www.stat.rice.edu/~hgsung/thesis.pdf

Many thanks in advance for your advice and suggestions with this topic.

jessegrabowski · July 24, 2024, 8:52am

Hi David,

The linked thesis is 100 pages long with many different models, so it’s difficult to know what you’re trying to do. Perhaps you could write out the equations for the model you’re trying to implement, or show some code that you’ve already tried, even if it didn’t work? This would help clarify the situation.

david · July 24, 2024, 12:05pm

import matplotlib.pyplot as plt
import numpy as np

The function bellow serves to generat random values for a given mean and amount of samples n


def generate_random_values(n, mean, low, high):

    # Generate initial random values using normal distribution
    values = np.random.normal(loc=mean, scale=1.0, size=n)
    
    # Clip values to the specified range
    values = np.clip(values, low, high)
    
    return values

function which is used to generate seeds

def generate_seeds_matrix(rows_seed, cols_seed):

    low = 6
    high = 1e7
    return np.random.randint(low, high, size=(rows_seed, cols_seed))

following function bellow is employed to generate X inputs

def generate_X(cols_seed):

    appended_array = np.array([])
    n_samples = np.array([10, 50, 12, 30, 4, 45])
    mean      = np.random.normal(loc=0.7, scale=1.0, size=np.size(n_samples))#np.array([0.75, 0.60, 0.80, 0.70, 0.5, 0.55])
    low       = np.array([0.30, 0.25, 0.55, 0.17, 0.35, 0.30])
    high       = np.array([1.2, 1.7, 2.2, 1.2, 1.9, 2])
    for i, _ in enumerate(n_samples):
        array = generate_random_values(n_samples[i], mean[i], low[i], high[i])
        if i ==0 or i==2:
            appended_array = np.concatenate((appended_array, array))
        else:
            #array = np.full((n_samples[i]), array[-1])
            array = np.full((n_samples[i]), 0.1*array)
            appended_array = np.concatenate((appended_array, array))
    return appended_array

last function serves to simulate observed values

def y_observed(seed_matrix, rows_seed):

    X_list = [generate_X(seed_i) for seed_i in seed_matrix]
    X = np.array(X_list)
    slopes =  np.random.uniform(-0.5, 0.7, rows_seed)
    alpha  = 1.5 
    return np.dot(slopes, np.log(X)) + alpha

part of the script bellow is used to display X, Y observed and distribution of observed data

#########################
np.random.seed(123654)
rows_seed = 3 # Xi inputs
cols_seed = 6 # fixed amount of mixed arrays, hardcoded


seed_matrix = generate_seeds_matrix(rows_seed, cols_seed)
for seed_i in (seed_matrix):
    plt.plot(generate_X(seed_i))
plt.title("generated X")
plt.show()

y = y_observed(seed_matrix, rows_seed)
plt.plot(y)
plt.title("observed Y")
plt.show()

az.plot_kde(y)
plt.title("observed Y dist")
plt.show()

The target is to perform the regression with mixtures of gaussians applied on the equation

y = alpha + beta1*x1 + beta2*x2 + beta3*x3

the worst thing is I even dont know how to start to create the model. As I mentioned before the multilinear robuts and non robust regression I was able to follow and apply it but with this approach I dont know, therfore I would like to kindly ask for your guidance and support.

david · July 24, 2024, 1:30pm

I will add additional example of approach I would like to try and apply it on the data above

https://pubs.acs.org/doi/10.1021/acsearthspacechem.1c00217

Topic		Replies	Views
Gaussian Mixture + Linear Regression for non-linear data Questions	0	532	February 4, 2020
Mixing/sampling coefficients from multiple linear regressions v5	0	291	August 19, 2022
Mixture regression Questions	2	994	January 29, 2019
2D Gaussian Mixture Model Questions	7	2075	August 25, 2017
Mixed multivariate Gauss distribution Questions	39	4815	April 12, 2018

Multilinear regression with mixture of gaussians

Related topics