Context-DPO: Aligning Language Models for Context-Faithfulness

Baolong Bi1, Shaohan Huang2, Yiwei Wang3, Tianchi Yang2, Zihan Zhang2, Haizhen Huang2, Lingrui Mei1, Junfeng Fang4, Zehao Li1, Furu Wei2, Weiwei Deng2, Feng Sun2, Qi Zhang2, Shenghua Liu1*
1University of Chinese Academy of Sciences 2Microsoft Corporation
3University of California, Merced 4National University of Singapore

Abstract

Generating reliable and accurate responses from large language models (LLMs) hinges on their ability to faithfully adhere to user instructions and integrate retrieved information. Although alignment techniques have proven effective in aligning LLMs with human intentions and values, the dimension of enhancing context-faithfulness remains largely underexplored.

To bridge this gap, we introduce Context-DPO, the first alignment method explicitly designed to reinforce LLMs' faithfulness to contextual information. As part of this effort, we present ConFiQA, a novel benchmark crafted to simulate Retrieval-Augmented Generation (RAG) scenarios, replicating real-world knowledge conflicts to rigorously assess context-faithfulness.

By utilizing both faithful and stubborn responses to context-driven queries in ConFiQA, Context-DPO aligns LLMs through DPO, ensuring they prioritize the provided context during generation.

Extensive experimentation validates the effectiveness of Context-DPO, yielding remarkable improvements of 35% to 280% across popular open-source models. Further analysis confirms that Context-DPO not only enhances context-faithfulness but also preserves the generative strengths of LLMs, offering valuable interpretability into how models leverage contextual knowledge.

Context Faithfulness

Context-Faithfulness in LLMs

Generating reliable and accurate responses from large language models (LLMs) hinges on their ability to faithfully adhere to user instructions and integrate retrieved information. Although alignment techniques have proven effective in aligning LLMs with human intentions and values, the dimension of enhancing context-faithfulness remains largely underexplored.

📚 ConFiQA: A New Benchmark of Context-Faithfulness

We introduce the ConFiQA benchmark to evaluate the context-faithfulness of LLMs in real-world Retrieval-Augmented Generation (RAG)strong> scenarios involving knowledge conflicts. ConFiQA challenges LLMs to navigate conflicting knowledge and prioritize context accuracy, driving advancements in RAG-based AI systems. ConFiQA consists of three datasets that reflect varying complexities and reasoning levels:

  • QA (Question-Answering): Single-hop tasks with context containing one counterfactual.
  • MR (Multi-hop Reasoning): Multi-hop tasks involving one counterfactual across multiple reasoning steps.
  • MC (Multi-Conflicts): Multi-hop tasks with context containing multiple counterfactuals, reflecting more complex conflicts.

We evaluated popular open-source and also close-source models using ConFiQA and found that context-faithfulness tends to decline as model size increases and training becomes more refined.

Context-DPO: Aligning LLMs for Context-Faithfulness

We argue that modern LLMs require alignment specifically to enhance context-faithfulness. To address this, we propose Context-DPO, a novel alignment method that constructs reasoning chains based on single-hop or multi-hop knowledge to generate the faithful responses and stubborn responses. Context-DPO leverages these responses to form preference pairs that guide the model toward context-faithful behavior through DPO.

teaser

🔥 Our Context-DPO effectively aligns LLMs to improve context-faithfulness without compromising their generative capabilities. It consistently outperforms all existing baselines without requiring any external prompt modifications. Specifically, the aligned models achieved substantial improvements compared to their original versions: 35% for Llama2-7B-chat, 78% for Llama3-8B, 151% for Mistral-7B, and 280% for Qwen2-7B 🚀.


teaser

We have open-sourced the aligned Context-Faithful LLMs by our Context-DPO.

Model Name HF Checkpoint License
Context-Faithful-LLaMA-2-7b-chat-hf 🤗 Bibaolong/Context-Faithful-LLaMA-2-7b-chat-hf Llama2-Chat
Context-Faithful-LLaMA-3-8b-instruct 🤗 Bibaolong/Context-Faithful-LLaMA-3-8b-instruct Llama3-Instruct
Context-Faithful-Mistral-7B-instruct 🤗 Bibaolong/Context-Faithful-Mistral-7B-instruct-v0.2 Mistral-Instruct
Context-Faithful-Qwen2-7B-Instruct 🤗 Bibaolong/Context-Faithful-Qwen2-7B-Instruct Qwen-Instruct

Exploration of the Metamorphosis in Context-Faithfulness

Our analysis showcases the transformative impact of Context-DPO on enhancing LLMs' context-faithfulness. The alignment reduces irrelevant and stubborn responses, leading to a significant rise in context-faithful answers. Through token-level analysis, we observe that aligned models effectively prioritize context-relevant tokens, boosting their probability distributions and improving overall response fidelity.

Logits Comparison
Figure 1: Average logits of context-faithful tokens, highlighting improvements with Context-DPO.
Kernel Density Estimation
Figure 2: Softmax ranking and probability distribution for context-faithful tokens.

These findings illustrate the internal mechanisms of Context-DPO, demonstrating its ability to significantly improve context-faithfulness alignment in LLMs. The results highlight how this alignment boosts the generation frequency of top-ranked context-faithful tokens, enhancing overall response quality without reliance on external methods.

BibTeX

@article{bi2024context,
  title={Context-DPO: Aligning Language Models for Context-Faithfulness},
  author={Bi, Baolong and Huang, Shaohan and Wang, Yiwei and Yang, Tianchi and Zhang, Zihan and Huang, Haizhen and Mei, Lingrui and Fang, Junfeng and Li, Zehao and Wei, Furu and others},
  journal={arXiv preprint arXiv:2412.15280},
  year={2024}
}