This paper proposes to learn the relevant features of remote sensing images for automatic spatio-spectral classification with the automatic optimization of multiple kernels. The method consists of building dedicated kernels for different sets of bands, contextual or textural features. The optimal linear combination of kernels is optimized through gradient descent on the support vector machine (SVM) objective function. Since a naïve implementation is computationally demanding, we propose an efficient model selection procedure based on kernel alignment. The result is a weight -learned from the data- for each kernel where both relevant and meaningless image features emerge after training. Excellent results are observed in both multi and hyperspectral image classification, improving standard SVM and other spatio-spectral formulations.