TY - GEN
T1 - ICPR 2020 - Competition on Harvesting Raw Tables from Infographics
AU - Davila, Kenny
AU - Tensmeyer, Chris
AU - Shekhar, Sumit
AU - Singh, Hrituraj
AU - Setlur, Srirangaraj
AU - Govindaraju, Venu
N1 - Publisher Copyright: © 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - This work summarizes the results of the second Competition on Harvesting Raw Tables from Infographics (ICPR 2020 CHART-Infographics). Chart Recognition is difficult and multifaceted, so for this competition we divide the process into the following tasks: Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided two sets of datasets for training and evaluation of the participant submissions. The first set is based on synthetic charts (Adobe Synth) generated from real data sources using matplotlib. The second one is based on manually annotated charts extracted from the Open Access section of the PubMed Central (UB PMC). More than 25 teams registered out of which 7 submitted results for different tasks of the competition. While results on synthetic data are near perfect at times, the same models still have room to improve when it comes to data extraction from real charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use.
AB - This work summarizes the results of the second Competition on Harvesting Raw Tables from Infographics (ICPR 2020 CHART-Infographics). Chart Recognition is difficult and multifaceted, so for this competition we divide the process into the following tasks: Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided two sets of datasets for training and evaluation of the participant submissions. The first set is based on synthetic charts (Adobe Synth) generated from real data sources using matplotlib. The second one is based on manually annotated charts extracted from the Open Access section of the PubMed Central (UB PMC). More than 25 teams registered out of which 7 submitted results for different tasks of the competition. While results on synthetic data are near perfect at times, the same models still have room to improve when it comes to data extraction from real charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use.
KW - Chart dataset
KW - Chart recognition
KW - Competition
UR - https://www.scopus.com/pages/publications/85104408611
U2 - 10.1007/978-3-030-68793-9_27
DO - 10.1007/978-3-030-68793-9_27
M3 - Conference contribution
SN - 9783030687922
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 361
EP - 380
BT - Pattern Recognition. ICPR International Workshops and Challenges, 2021, Proceedings
A2 - Del Bimbo, Alberto
A2 - Cucchiara, Rita
A2 - Sclaroff, Stan
A2 - Farinella, Giovanni Maria
A2 - Mei, Tao
A2 - Bertini, Marco
A2 - Escalante, Hugo Jair
A2 - Vezzani, Roberto
PB - Springer Science and Business Media Deutschland GmbH
T2 - 25th International Conference on Pattern Recognition Workshops, ICPR 2020
Y2 - 10 January 2021 through 11 January 2021
ER -