TY - GEN
T1 - UNIWIZ
T2 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
AU - Das, Souvik
AU - Srihari, Rohini K.
N1 - Publisher Copyright: © 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Warning: This paper contains disturbing language. Large Language Models (LLMs) have made significant progress in integrating safety and knowledge alignment. However, adversarial actors can manipulate these models into generating unsafe responses, and excessive safety alignment can lead to unintended hallucinations. To address these challenges, we introduce UNIWIZ, a novel 2-step data orchestration framework that unifies safety and knowledge data generation. We propose a "safety-priming" method to generate synthetic safety data and overcome safety bottlenecks. We also inject relevant knowledge into conversations by retrieving factual information from curated sources. UNIWIZ dataset consists of 17, 638 quality-controlled conversations and 10, 000 augmented preference data. Pretrained models fine-tuned on UNIWIZ show improvements across various metrics and outperform state-of-the-art instruction-tuned models trained on much larger datasets.
AB - Warning: This paper contains disturbing language. Large Language Models (LLMs) have made significant progress in integrating safety and knowledge alignment. However, adversarial actors can manipulate these models into generating unsafe responses, and excessive safety alignment can lead to unintended hallucinations. To address these challenges, we introduce UNIWIZ, a novel 2-step data orchestration framework that unifies safety and knowledge data generation. We propose a "safety-priming" method to generate synthetic safety data and overcome safety bottlenecks. We also inject relevant knowledge into conversations by retrieving factual information from curated sources. UNIWIZ dataset consists of 17, 638 quality-controlled conversations and 10, 000 augmented preference data. Pretrained models fine-tuned on UNIWIZ show improvements across various metrics and outperform state-of-the-art instruction-tuned models trained on much larger datasets.
UR - https://www.scopus.com/pages/publications/85205299491
U2 - 10.18653/v1/2024.findings-acl.102
DO - 10.18653/v1/2024.findings-acl.102
M3 - Conference contribution
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 1749
EP - 1762
BT - The 62nd Annual Meeting of the Association for Computational Linguistics
A2 - Ku, Lun-Wei
A2 - Martins, Andre
A2 - Srikumar, Vivek
PB - Association for Computational Linguistics (ACL)
Y2 - 11 August 2024 through 16 August 2024
ER -