Skip to main navigation Skip to search Skip to main content

Word Break on SLP-Compressed Texts

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Word Break is a prototypical factorization problem in string processing: Given a word w of length N and a dictionary D = {d1, d2,... , dK} of K strings, determine whether we can partition w into words from D. We propose the first algorithm that solves the Word Break problem over the SLP-compressed input text w. Specifically, we show that, given the string w represented using an SLP of size g, we can solve the Word Break problem in O(g·m + M) time, where m = maxK i=1 |di |, M = PK i=1 |di |, and = 2 is the matrix multiplication exponent. We obtain our algorithm as a simple corollary of a more general result: We show that in O(gm + M) time, we can index the input text w so that solving the Word Break problem for any of its substrings takes O(m2 log N) time (independent of the substring length). Our second contribution is a lower bound: We prove that, unless the Combinatorial k-Clique Conjecture fails, there is no combinatorial algorithm for Word Break on SLP-compressed strings running in O(g·m2-ϵ + M) time for any ϵ > 0.

Original languageEnglish
Title of host publicationProceedings - DCC 2025
Subtitle of host publication2025 Data Compression Conference
EditorsAli Bilgin, James E. Fowler, Joan Serra-Sagrista, Yan Ye, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages153-162
Number of pages10
ISBN (Electronic)9798331534714
DOIs
StatePublished - 2025
Event2025 Data Compression Conference, DCC 2025 - Snowbird, United States
Duration: Mar 18 2025Mar 21 2025

Publication series

NameData Compression Conference Proceedings

Conference

Conference2025 Data Compression Conference, DCC 2025
Country/TerritoryUnited States
CitySnowbird
Period03/18/2503/21/25

Keywords

  • grammar compression
  • slp
  • word break

Fingerprint

Dive into the research topics of 'Word Break on SLP-Compressed Texts'. Together they form a unique fingerprint.

Cite this