Evaluating and selecting features via information theoretic lower bounds of feature inner correlations for high-dimensional data

Zhang, Yishi and Zhu, Ruilin and Chen, Zhijun and Gao, Jie and Xia, De (2020) Evaluating and selecting features via information theoretic lower bounds of feature inner correlations for high-dimensional data. European Journal of Operational Research, 290 (1). pp. 235-247. ISSN 0377-2217

[thumbnail of EOR16774]
Text (EOR16774)
EOR16774.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (1MB)

Abstract

Feature selection is an important preprocessing and interpretable method in the fields where big data plays an essential role. In this paper, we first reformulate and analyze some representative information theoretic feature selection methods from the perspective of approximations of feature inner correlations, and indicate that many of these methods cannot guarantee any theoretical bounds of feature inner correlations. We thus introduce two lower bounds that have very simple forms for feature redundancy and complementarity, and verify that they are closer to the optima than the existing lower bounds applied by some state-of-the-art information theoretic methods. A simple and effective feature selection method based on the proposed lower bounds is then proposed and empirically verified with a wide scope of real-world datasets. The experimental results show that the proposed method achieves promising improvement on feature selection, indicating the effectiveness of the feature criterion consisting of the proposed lower bounds of redundancy and complementarity.

Item Type:
Journal Article
Journal or Publication Title:
European Journal of Operational Research
Additional Information:
This is the author’s version of a work that was accepted for publication in European Journal of Operational Research. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in European Journal of Operational Research, 290, 1, 2020 DOI: 10.1016/j.ejor.2020.09.028
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/2600/2611
Subjects:
?? data miningfeature selectionredundancycomplementaritylower boundsmodelling and simulationmanagement science and operations researchinformation systems and management ??
ID Code:
149233
Deposited By:
Deposited On:
19 Nov 2020 16:05
Refereed?:
Yes
Published?:
Published
Last Modified:
15 Apr 2024 00:11