Skip to main navigation Skip to search Skip to main content

Integration of aggressive bound tightening and Mixed Integer Programming for Cost-sensitive feature selection in medical diagnosis

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Silent diseases is an umbrella term that captures a spectrum of chronic illnesses that produce no clinically obvious signs and are diagnosed at advanced stages when the damage is irreversible. Current diagnostic strategies of silent diseases depend on self-reported symptoms and observed behavior through extended periods of time, and until now there are no specific clinical tests to diagnose silent diseases. Scientific research suggests the importance of early diagnosis to restore the functionality and reduce diseases-related complications. Previous studies primarily focused on feature selection methods to aid in medical diagnosis. Traditional feature selection methods are primarily focused on correct classification and often ignore features’ costs; the cost of clinical tests required to acquire the feature value. However, in medical diagnosis, features have different associated costs. Because ignoring features’ costs may result in a high cost diagnostic strategy that cannot be used in practice, developing a low-cost diagnostic strategy remains a subject of much interest. In this paper, new Mixed Integer Programming (MIP) models, namely, Cost-sensitive Support Vector Machine (CS-SVM) and Cost-sensitive Multi-surface Method Tree (CS-MSMT) that allow for simultaneous selection of low-cost and informative features are proposed. The CS-SVM and CS-MSMT are superior because they have the ability to account for shared costs. The CS-SVM and CS-MSMT were modified to embed shared costs across feature groups, and are termed Discounted CS-SVM (dCS-SVM) and Discounted CS-MSMT (dCS-MSMT), respectively. Computationally effective algorithm that integrates aggressive bound tightening with the MIP formulation is proposed. To demonstrate the effectiveness of the proposed models, different analysis paradigms are conducted on six UCI medical datasets; Chronic Kidney Disease, Hepatitis, Heart Disease, Thyroid, Diabetes and Leukemia. The results demonstrate the efficiency and robustness of the CS-SVM and CS-MSMT (and consequently the dCS-SVM and dCS-MSMT) under various conditions. The CS-SVM and CS-MSMT improved accuracy by 10.3% and 3.4% and reduced costs by 94.3% and 72.4% in the leukemia dataset, respectively.

Original languageEnglish
Article number115902
JournalExpert Systems with Applications
Volume187
DOIs
StatePublished - Jan 2022

Keywords

  • Aggressive bound tightening
  • Cost-sensitive
  • Feature selection
  • Medical diagnosis
  • Mixed Integer Linear Programming
  • Shared costs

Fingerprint

Dive into the research topics of 'Integration of aggressive bound tightening and Mixed Integer Programming for Cost-sensitive feature selection in medical diagnosis'. Together they form a unique fingerprint.

Cite this