TY - GEN
T1 - Parsing XML using parallel traversal of streaming trees
AU - Pan, Yinfei
AU - Zhang, Ying
AU - Chiu, Kenneth
PY - 2008
Y1 - 2008
N2 - XML has been widely adopted across a wide spectrum of applications. Its parsing efficiency, however, remains a concern, and can be a bottleneck. With the current trend towards multicore CPUs, parallelization to improve performance is increasingly relevant. In many applications, the XML is streamed from the network, and thus the complete XML document is never in memory at any single moment in time. Parallel parsing of such a stream can be equated to parallel depth-first traversal of a streaming tree. Existing research on parallel tree traversal has assumed the entire tree was available in-memory, and thus cannot be directly applied. In this paper we investigate parallel, SAX-style parsing of XML via a parallel, depth-first traversal of the streaming document. We show good scalability up to about 6 cores on a Linux platform.
AB - XML has been widely adopted across a wide spectrum of applications. Its parsing efficiency, however, remains a concern, and can be a bottleneck. With the current trend towards multicore CPUs, parallelization to improve performance is increasingly relevant. In many applications, the XML is streamed from the network, and thus the complete XML document is never in memory at any single moment in time. Parallel parsing of such a stream can be equated to parallel depth-first traversal of a streaming tree. Existing research on parallel tree traversal has assumed the entire tree was available in-memory, and thus cannot be directly applied. In this paper we investigate parallel, SAX-style parsing of XML via a parallel, depth-first traversal of the streaming document. We show good scalability up to about 6 cores on a Linux platform.
UR - https://www.scopus.com/pages/publications/58449123305
U2 - 10.1007/978-3-540-89894-8_16
DO - 10.1007/978-3-540-89894-8_16
M3 - Conference contribution
SN - 354089893X
SN - 9783540898931
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 142
EP - 156
BT - High Performance Computing - HiPC 2008 - 15th International Conference, Proceedings
PB - Springer Verlag
T2 - 15th International Conference on High Performance Computing, HiPC 2008
Y2 - 17 December 2008 through 20 December 2008
ER -