TY - GEN
T1 - Word Break on SLP-Compressed Texts
AU - De, Rajat
AU - Kempa, Dominik
N1 - Publisher Copyright: © 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Word Break is a prototypical factorization problem in string processing: Given a word w of length N and a dictionary D = {d1, d2,... , dK} of K strings, determine whether we can partition w into words from D. We propose the first algorithm that solves the Word Break problem over the SLP-compressed input text w. Specifically, we show that, given the string w represented using an SLP of size g, we can solve the Word Break problem in O(g·m + M) time, where m = maxK i=1 |di |, M = PK i=1 |di |, and = 2 is the matrix multiplication exponent. We obtain our algorithm as a simple corollary of a more general result: We show that in O(gm + M) time, we can index the input text w so that solving the Word Break problem for any of its substrings takes O(m2 log N) time (independent of the substring length). Our second contribution is a lower bound: We prove that, unless the Combinatorial k-Clique Conjecture fails, there is no combinatorial algorithm for Word Break on SLP-compressed strings running in O(g·m2-ϵ + M) time for any ϵ > 0.
AB - Word Break is a prototypical factorization problem in string processing: Given a word w of length N and a dictionary D = {d1, d2,... , dK} of K strings, determine whether we can partition w into words from D. We propose the first algorithm that solves the Word Break problem over the SLP-compressed input text w. Specifically, we show that, given the string w represented using an SLP of size g, we can solve the Word Break problem in O(g·m + M) time, where m = maxK i=1 |di |, M = PK i=1 |di |, and = 2 is the matrix multiplication exponent. We obtain our algorithm as a simple corollary of a more general result: We show that in O(gm + M) time, we can index the input text w so that solving the Word Break problem for any of its substrings takes O(m2 log N) time (independent of the substring length). Our second contribution is a lower bound: We prove that, unless the Combinatorial k-Clique Conjecture fails, there is no combinatorial algorithm for Word Break on SLP-compressed strings running in O(g·m2-ϵ + M) time for any ϵ > 0.
KW - grammar compression
KW - slp
KW - word break
UR - https://www.scopus.com/pages/publications/105006841111
U2 - 10.1109/DCC62719.2025.00023
DO - 10.1109/DCC62719.2025.00023
M3 - Conference contribution
T3 - Data Compression Conference Proceedings
SP - 153
EP - 162
BT - Proceedings - DCC 2025
A2 - Bilgin, Ali
A2 - Fowler, James E.
A2 - Serra-Sagrista, Joan
A2 - Ye, Yan
A2 - Storer, James A.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 Data Compression Conference, DCC 2025
Y2 - 18 March 2025 through 21 March 2025
ER -