TY - GEN
T1 - An Expert-driven Computer-aided Classification for Database Construction
T2 - 2020 IEEE Nuclear Science Symposium and Medical Imaging Conference, NSS/MIC 2020
AU - Ng, Kenneth
AU - Wang, Luhao
AU - Pomeroy, Marc J.
AU - Cao, Weiguo
AU - Gao, Yongfeng
AU - Liang, Zhengrong
N1 - Publisher Copyright: © 2020 IEEE
PY - 2020
Y1 - 2020
N2 - Data preparation for any machine learning process is of utmost importance to produce consistent and robust results. The challenges faced with identifying the colon polyps in computed tomographic colonography (CTC) images are that some polyps do not have a definite outline, some are coated by oral tag material due to poor preparation, and the intra-variation when multiple contributors work on the same dataset. This study aims to relieve the challenge by an iterative user-driven procedure, which starts by an expert to draw the initial borders of the colon polyps (or VOIs), followed by using computer aided classification (CAC) on an adequately grouped VOIs to find outliers. Then the expert examines the outliers for refining the VOIs and the CAC is repeated on the refined VOIs. This iterative procedure is repeated until a threshold is satisfied. The expert-driven CAC procedure was validated by experiments using three datasets. One small dataset containing 87 large polyp masses, and two large datasets containing 726 and 563 polyp masses varying in size from medium and small. Of the dataset with 87 polyps, 63 VOIs were constructed previously by three experts as the baseline, including 31 benign and 32 malignant. The remaining 24 (12 benign and 12 malignant) were added after going through the expert-driven CAC procedure (i.e. only one expert). The two large datasets had multiple contributors and each dataset could be split into several subgroups and cross validated using the highest performing subgroup as the baseline. The cross validation was performed using the grey-level co-occurrence measures of the VOIs, two-fold validation, and random forest classifier. The AUC score on the large polyp dataset remained the same as that of the baseline when the 24 new VOIs were added using the expert-driven CAC procedure, while varied by 4% if the procedure was not used. The AUC score on the medium and small polyp datasets had nominal increases up to 2% after expert-driven CAC procedure. Upon further examination on the-up-to 2% variation, the causes include flat small polyps and small polyps being submerged and/or surrounded by oral tagging materials. These causes of up to 2% variation are CTC data specific and acceptable. In conclusion, expert-driven CAC is important for large database construction.
AB - Data preparation for any machine learning process is of utmost importance to produce consistent and robust results. The challenges faced with identifying the colon polyps in computed tomographic colonography (CTC) images are that some polyps do not have a definite outline, some are coated by oral tag material due to poor preparation, and the intra-variation when multiple contributors work on the same dataset. This study aims to relieve the challenge by an iterative user-driven procedure, which starts by an expert to draw the initial borders of the colon polyps (or VOIs), followed by using computer aided classification (CAC) on an adequately grouped VOIs to find outliers. Then the expert examines the outliers for refining the VOIs and the CAC is repeated on the refined VOIs. This iterative procedure is repeated until a threshold is satisfied. The expert-driven CAC procedure was validated by experiments using three datasets. One small dataset containing 87 large polyp masses, and two large datasets containing 726 and 563 polyp masses varying in size from medium and small. Of the dataset with 87 polyps, 63 VOIs were constructed previously by three experts as the baseline, including 31 benign and 32 malignant. The remaining 24 (12 benign and 12 malignant) were added after going through the expert-driven CAC procedure (i.e. only one expert). The two large datasets had multiple contributors and each dataset could be split into several subgroups and cross validated using the highest performing subgroup as the baseline. The cross validation was performed using the grey-level co-occurrence measures of the VOIs, two-fold validation, and random forest classifier. The AUC score on the large polyp dataset remained the same as that of the baseline when the 24 new VOIs were added using the expert-driven CAC procedure, while varied by 4% if the procedure was not used. The AUC score on the medium and small polyp datasets had nominal increases up to 2% after expert-driven CAC procedure. Upon further examination on the-up-to 2% variation, the causes include flat small polyps and small polyps being submerged and/or surrounded by oral tagging materials. These causes of up to 2% variation are CTC data specific and acceptable. In conclusion, expert-driven CAC is important for large database construction.
KW - CAC
KW - CTC
KW - Colon polyps
KW - Database
KW - ML
UR - https://www.scopus.com/pages/publications/85124703150
U2 - 10.1109/NSS/MIC42677.2020.9507990
DO - 10.1109/NSS/MIC42677.2020.9507990
M3 - Conference contribution
T3 - 2020 IEEE Nuclear Science Symposium and Medical Imaging Conference, NSS/MIC 2020
BT - 2020 IEEE Nuclear Science Symposium and Medical Imaging Conference, NSS/MIC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 31 October 2020 through 7 November 2020
ER -