TY - GEN
T1 - Breast cancer prediction using data mining method
AU - Wang, Haifeng
AU - Yoon, Sang Won
PY - 2015
Y1 - 2015
N2 - This paper presents a study about breast cancer prediction based on data mining methods to discover an effective way to predict breast cancer. The objective of this paper is to compare and identify an accurate model to predict the incidence of breast cancer based on various patients' clinical records. Four data mining models are applied in this paper, i.e., support vector machine (SVM), artificial neural network (ANN), Naive Bayes classifier, AdaBoost tree. Furthermore, feature space is highly discussed in this paper due to its high influence on the efficiency and effectiveness of the learning process. To test the influence of feature space reduction, a hybrid between principal component analysis (PCA) and related data mining models is proposed, which applies a principle component analysis method to reduce the feature space. To evaluate the performance of these models, two widely used test data sets are used, Wisconsin Breast Cancer Database (1991) and Wisconsin Diagnostic Breast Cancer (1995). 10-fold cross-validation method is implemented to estimate the test error of each model. The results performed by this analysis demonstrate a comprehensive trade-off between these strategies and also provides a detailed evaluation on the models. It is expected that in real application, physicians and patients can benefit from the feature recognition outcome to prevent breast cancer.
AB - This paper presents a study about breast cancer prediction based on data mining methods to discover an effective way to predict breast cancer. The objective of this paper is to compare and identify an accurate model to predict the incidence of breast cancer based on various patients' clinical records. Four data mining models are applied in this paper, i.e., support vector machine (SVM), artificial neural network (ANN), Naive Bayes classifier, AdaBoost tree. Furthermore, feature space is highly discussed in this paper due to its high influence on the efficiency and effectiveness of the learning process. To test the influence of feature space reduction, a hybrid between principal component analysis (PCA) and related data mining models is proposed, which applies a principle component analysis method to reduce the feature space. To evaluate the performance of these models, two widely used test data sets are used, Wisconsin Breast Cancer Database (1991) and Wisconsin Diagnostic Breast Cancer (1995). 10-fold cross-validation method is implemented to estimate the test error of each model. The results performed by this analysis demonstrate a comprehensive trade-off between these strategies and also provides a detailed evaluation on the models. It is expected that in real application, physicians and patients can benefit from the feature recognition outcome to prevent breast cancer.
KW - Breast cancer prediction
KW - Data mining
KW - Mold cross-validation
UR - https://www.scopus.com/pages/publications/84970967270
M3 - Conference contribution
T3 - IIE Annual Conference and Expo 2015
SP - 818
EP - 828
BT - IIE Annual Conference and Expo 2015
PB - Institute of Industrial Engineers
T2 - IIE Annual Conference and Expo 2015
Y2 - 30 May 2015 through 2 June 2015
ER -