Skip to main navigation Skip to search Skip to main content

Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We propose a novel clustering pipeline to detect and characterize influence campaigns from documents. This approach clusters parts of document, detects clusters that likely reflect an influence campaign, and then identifies documents linked to an influence campaign via their association with the high-influence clusters. Our approach outperforms both the direct document-level classification and the direct document-level clustering approach in predicting if a document is part of an influence campaign. We propose various novel techniques to enhance our pipeline, including using an existing event factuality prediction system to obtain document parts, and aggregating multiple clustering experiments to improve the performance of both cluster and document classification. Classifying documents after clustering not only accurately extracts the parts of the documents that are relevant to influence campaigns, but also captures influence campaigns as a coordinated and holistic phenomenon. Our approach makes possible more fine-grained and interpretable characterizations of influence campaigns from documents.

Original languageEnglish
Title of host publicationNLP+CSS 2024 - 6thWorkshop on Natural Language Processing and Computational Social Science, Proceedings of the Workshop
EditorsDallas Card, Anjalie Field, Dirk Hovy, Katherine Keith
PublisherAssociation for Computational Linguistics (ACL)
Pages132-143
Number of pages12
ISBN (Electronic)9798891761124
StatePublished - 2024
Event6thWorkshop on Natural Language Processing and Computational Social Science, NLP+CSS 2024 - Mexico City, Mexico
Duration: Jun 21 2024 → …

Publication series

NameNLP+CSS 2024 - 6thWorkshop on Natural Language Processing and Computational Social Science, Proceedings of the Workshop

Conference

Conference6thWorkshop on Natural Language Processing and Computational Social Science, NLP+CSS 2024
Country/TerritoryMexico
CityMexico City
Period06/21/24 → …

Fingerprint

Dive into the research topics of 'Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents'. Together they form a unique fingerprint.

Cite this