Skip to main navigation Skip to search Skip to main content

UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Warning: This paper contains disturbing language. Large Language Models (LLMs) have made significant progress in integrating safety and knowledge alignment. However, adversarial actors can manipulate these models into generating unsafe responses, and excessive safety alignment can lead to unintended hallucinations. To address these challenges, we introduce UNIWIZ, a novel 2-step data orchestration framework that unifies safety and knowledge data generation. We propose a "safety-priming" method to generate synthetic safety data and overcome safety bottlenecks. We also inject relevant knowledge into conversations by retrieving factual information from curated sources. UNIWIZ dataset consists of 17, 638 quality-controlled conversations and 10, 000 augmented preference data. Pretrained models fine-tuned on UNIWIZ show improvements across various metrics and outperform state-of-the-art instruction-tuned models trained on much larger datasets.

Original languageEnglish
Title of host publicationThe 62nd Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationFindings of the Association for Computational Linguistics, ACL 2024
EditorsLun-Wei Ku, Andre Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages1749-1762
Number of pages14
ISBN (Electronic)9798891760998
DOIs
StatePublished - 2024
EventFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand
Duration: Aug 11 2024Aug 16 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics

Conference

ConferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityHybrid, Bangkok
Period08/11/2408/16/24

Fingerprint

Dive into the research topics of 'UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations'. Together they form a unique fingerprint.

Cite this