Feature selection with discrete distribution?

Hello, I’d like to use a bounded discrete distribution to perform a type of feature selection for a predictive model where each draw from the discrete distribution corresponds to a feature/column in the model inputs. The columns in the model inputs have an inherent order so using a binomial distribution, for instance, would let me tell the model something to the effect of "if column 5 is the most likely column, then column 4 and 6 should be more likely than column 1 or 10. Additionally, I’d like the prediction to be a weighted average of each of the features. Sampling from a binomial or similar distribution seems like a nice way to get a weighted average of the features that respects feature order.

I can set up the model to do the feature selection but one I connect it to observed data the model breaks with a difficult to interpret error.

Is what I’m trying to do possible? Are there other ways to enforce order in feature selection?

Thank you!

Here’s a toy example:

import numpy as np
import pymc3 as pm
from theano import shared
    
n_obs = 100
n_features = 3
X = np.random.randn(n_obs, n_features)
y = np.random.randn(n_obs)

X_shared = shared(X)

with pm.Model() as model:    
    
    # binomal distribution with range equal to n_features
    selected_feature = pm.Bound(pm.Binomial, upper=n_features-1)('selected_feature', n=n_features-1, p=.5)
    
    # intercept and scalar 
    coefs = pm.Normal('coefs', mu=0, sd=1, shape=2)
    
    # intercept + scalar * single_column
    estimated = coefs[0] * (coefs[1] * X_shared[:, selected_feature])
    
    # fit to observed data
    pm.Normal('y', mu=estimated, sd=1, observed=y)   
    
    trace = pm.sample(500, tune=500, njobs=1, chains=1)

Error message:

IndexError: index out of bounds
Apply node that caused the error: Subtensor{::, int64}(<TensorType(float64, matrix)>, ScalarFromTensor.0)
Toposort index: 18
Inputs types: [TensorType(float64, matrix), Scalar(int64)]
Inputs shapes: [(100, 3), ()]
Inputs strides: [(24, 8), ()]
Inputs values: [‘not shown’, 3]
Outputs clients: [[Elemwise{Composite{(i0 + (-sqr((i1 - (i2 * i3 * i4)))))}}(TensorConstant{(1,) of -1…3787706641}, TensorConstant{[ 0.005528…25930436]}, InplaceDimShuffle{x}.0, InplaceDimShuffle{x}.0, Subtensor{::, int64}.0)]]

This is usually not recommended because you have the problem of dynamic input shape and difficulty of inferencing discrete parameters. You should have a look at Bayesian sparse regression: https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html