TY - GEN
T1 - Capturing Author Self Beliefs in Social Media Language
AU - Mangalik, Siddharth
AU - Ganesan, Adithya V.
AU - Wheeler, Abigail
AU - Kerry, Nicholas
AU - Clifton, Jeremy D.W.
AU - Andrew Schwartz, H.
AU - Boyd, Ryan L.
N1 - Publisher Copyright: © 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Measuring the prevalence and dimensions of self beliefs is essential for understanding human self-perception and various psychological outcomes. In this paper, we develop a novel task for classifying language that contains explicit or implicit mentions of the author's self beliefs. We contribute a set of 2,000 human-annotated self beliefs, 100,000 LLM-labeled examples, and 10,000 surveyed self belief paragraphs. We then evaluate several encoder-based classifiers and training routines for this task. Our trained model, SelfAwareNet, achieved an AUC of 0.944, outperforming 0.839 from OpenAI's state-of-the-art GPT-4o model. Using this model we derive data-driven categories of self beliefs and demonstrate their ability to predict valence, depression, anxiety, and stress. We release the resulting self belief classification model and annotated datasets for use in future research.
AB - Measuring the prevalence and dimensions of self beliefs is essential for understanding human self-perception and various psychological outcomes. In this paper, we develop a novel task for classifying language that contains explicit or implicit mentions of the author's self beliefs. We contribute a set of 2,000 human-annotated self beliefs, 100,000 LLM-labeled examples, and 10,000 surveyed self belief paragraphs. We then evaluate several encoder-based classifiers and training routines for this task. Our trained model, SelfAwareNet, achieved an AUC of 0.944, outperforming 0.839 from OpenAI's state-of-the-art GPT-4o model. Using this model we derive data-driven categories of self beliefs and demonstrate their ability to predict valence, depression, anxiety, and stress. We release the resulting self belief classification model and annotated datasets for use in future research.
UR - https://www.scopus.com/pages/publications/105021012827
U2 - 10.18653/v1/2025.acl-long.69
DO - 10.18653/v1/2025.acl-long.69
M3 - Conference contribution
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 1362
EP - 1376
BT - Long Papers
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Y2 - 27 July 2025 through 1 August 2025
ER -