TY - GEN
T1 - Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation
AU - Ganesan, Adithya V.
AU - Lal, Yash Kumar
AU - Nilsson, August Håkan
AU - Schwartz, H. Andrew
N1 - Publisher Copyright: © 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Very large language models (LLMs) perform extremely well on a spectrum of NLP tasks in a zero-shot setting. However, little is known about their performance on human-level NLP problems which rely on understanding psychological concepts, such as assessing personality traits. In this work, we investigate the zero-shot ability of GPT-3 to estimate the Big 5 personality traits from users’ social media posts. Through a set of systematic experiments, we find that zero-shot GPT-3 performance is somewhat close to an existing pre-trained SotA for broad classification upon injecting knowledge about the trait in the prompts. However, when prompted to provide fine-grained classification, its performance drops to close to a simple most frequent class (MFC) baseline. We further analyze where GPT-3 performs better, as well as worse, than a pretrained lexical model, illustrating systematic errors that suggest ways to improve LLMs on human-level NLP tasks.
AB - Very large language models (LLMs) perform extremely well on a spectrum of NLP tasks in a zero-shot setting. However, little is known about their performance on human-level NLP problems which rely on understanding psychological concepts, such as assessing personality traits. In this work, we investigate the zero-shot ability of GPT-3 to estimate the Big 5 personality traits from users’ social media posts. Through a set of systematic experiments, we find that zero-shot GPT-3 performance is somewhat close to an existing pre-trained SotA for broad classification upon injecting knowledge about the trait in the prompts. However, when prompted to provide fine-grained classification, its performance drops to close to a simple most frequent class (MFC) baseline. We further analyze where GPT-3 performs better, as well as worse, than a pretrained lexical model, illustrating systematic errors that suggest ways to improve LLMs on human-level NLP tasks.
UR - https://www.scopus.com/pages/publications/85174813390
M3 - Conference contribution
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 390
EP - 400
BT - WASSA 2023 - 13th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Proceedings of the Workshop
A2 - Barnes, Jeremy
A2 - De Clercq, Orphee
A2 - Klinger, Roman
PB - Association for Computational Linguistics (ACL)
T2 - 13th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA 2023
Y2 - 14 July 2023
ER -