Thanks for the suggestion. I’m still not convince this is what I’m looking for.
I don’t want to regularize a specific range. As what I have understood, we usually use ARD to determine the importance of each feature. So in a simple case such as linear regression, we use a hyper-prior assigned to the std of each coefficient to control the effect of each feature towards the predictions.
However, ℓ and η are same for all the features according to usual GP implementation. We do not assign a single ℓ for each features, instead we divide the euclidean distance between two data samples (these operations may vary according to the kernel type) using a single dimensional RV ℓ. So how could that control the effect on individual feature differently (according to the relevance of the each feature)?