In the feature classification domain, the choice of data affects widely the results. For the Hyperspectral image, the bands dont all contain the information; some bands are irrelevant like those affected by various atmospheric effects, see Figure.4, and decrease the classification accuracy. And there exist redundant bands to complicate the learning system and product incorrect prediction [14]. Even the bands contain enough information about the scene they may can't predict the classes correctly if the dimension of space images, see Figure.3, is so large that needs many cases to detect the relationship between the bands and the scene (Hughes phenomenon) [10]. We can reduce the dimensionality of hyperspectral images by selecting only the relevant bands (feature selection or subset selection methodology), or extracting, from the original bands, new bands containing the maximal information about the classes, using any functions, logical or numerical (feature extraction methodology) [11][9]. Here we focus on the feature selection using mutual information. Hyperspectral images have three advantages regarding the multispectral images [6],