Context-DPO: Aligning LLMs for Context-Faithfulness

Abstract

Generating reliable and accurate responses from large language models (LLMs) hinges on their ability to faithfully adhere to user instructions and integrate retrieved information. Although alignment techniques have proven effective in aligning LLMs with human intentions and values, the dimension of enhancing context-faithfulness remains largely underexplored.

To bridge this gap, we introduce Context-DPO, the first alignment method explicitly designed to reinforce LLMs' faithfulness to contextual information. As part of this effort, we present ConFiQA, a novel benchmark crafted to simulate Retrieval-Augmented Generation (RAG) scenarios, replicating real-world knowledge conflicts to rigorously assess context-faithfulness.

By utilizing both faithful and stubborn responses to context-driven queries in ConFiQA, Context-DPO aligns LLMs through DPO, ensuring they prioritize the provided context during generation.

Extensive experimentation validates the effectiveness of Context-DPO, yielding remarkable improvements of 35% to 280% across popular open-source models. Further analysis confirms that Context-DPO not only enhances context-faithfulness but also preserves the generative strengths of LLMs, offering valuable interpretability into how models leverage contextual knowledge.

Context-Faithfulness in LLMs

Generating reliable and accurate responses from large language models (LLMs) hinges on their ability to faithfully adhere to user instructions and integrate retrieved information. Although alignment techniques have proven effective in aligning LLMs with human intentions and values, the dimension of enhancing context-faithfulness remains largely underexplored.

📚 ConFiQA: A New Benchmark of Context-Faithfulness

We introduce the ConFiQA benchmark to evaluate the context-faithfulness of LLMs in real-world Retrieval-Augmented Generation (RAG)strong> scenarios involving knowledge conflicts. ConFiQA challenges LLMs to navigate conflicting knowledge and prioritize context accuracy, driving advancements in RAG-based AI systems. ConFiQA consists of three datasets that reflect varying complexities and reasoning levels:

QA (Question-Answering): Single-hop tasks with context containing one counterfactual.
MR (Multi-hop Reasoning): Multi-hop tasks involving one counterfactual across multiple reasoning steps.
MC (Multi-Conflicts): Multi-hop tasks with context containing multiple counterfactuals, reflecting more complex conflicts.

We evaluated popular open-source and also close-source models using ConFiQA and found that context-faithfulness tends to decline as model size increases and training becomes more refined.

Context-DPO: Aligning LLMs for Context-Faithfulness

We argue that modern LLMs require alignment specifically to enhance context-faithfulness. To address this, we propose Context-DPO, a novel alignment method that constructs reasoning chains based on single-hop or multi-hop knowledge to generate the faithful responses and stubborn responses. Context-DPO leverages these responses to form preference pairs that guide the model toward context-faithful behavior through DPO.

🔥 Our Context-DPO effectively aligns LLMs to improve context-faithfulness without compromising their generative capabilities. It consistently outperforms all existing baselines without requiring any external prompt modifications. Specifically, the aligned models achieved substantial improvements compared to their original versions: 35% for Llama2-7B-chat, 78% for Llama3-8B, 151% for Mistral-7B, and 280% for Qwen2-7B 🚀.

We have open-sourced the aligned Context-Faithful LLMs by our Context-DPO.

Model Name	HF Checkpoint	License
Context-Faithful-LLaMA-2-7b-chat-hf	🤗 Bibaolong/Context-Faithful-LLaMA-2-7b-chat-hf	Llama2-Chat
Context-Faithful-LLaMA-3-8b-instruct	🤗 Bibaolong/Context-Faithful-LLaMA-3-8b-instruct	Llama3-Instruct
Context-Faithful-Mistral-7B-instruct	🤗 Bibaolong/Context-Faithful-Mistral-7B-instruct-v0.2	Mistral-Instruct
Context-Faithful-Qwen2-7B-Instruct	🤗 Bibaolong/Context-Faithful-Qwen2-7B-Instruct	Qwen-Instruct

Exploration of the Metamorphosis in Context-Faithfulness

Our analysis showcases the transformative impact of Context-DPO on enhancing LLMs' context-faithfulness. The alignment reduces irrelevant and stubborn responses, leading to a significant rise in context-faithful answers. Through token-level analysis, we observe that aligned models effectively prioritize context-relevant tokens, boosting their probability distributions and improving overall response fidelity.

Logits Comparison — Figure 1: Average logits of context-faithful tokens, highlighting improvements with Context-DPO.

Kernel Density Estimation — Figure 2: Softmax ranking and probability distribution for context-faithful tokens.

These findings illustrate the internal mechanisms of Context-DPO, demonstrating its ability to significantly improve context-faithfulness alignment in LLMs. The results highlight how this alignment boosts the generation frequency of top-ranked context-faithful tokens, enhancing overall response quality without reliance on external methods.

BibTeX

@article{bi2024context,
  title={Context-DPO: Aligning Language Models for Context-Faithfulness},
  author={Bi, Baolong and Huang, Shaohan and Wang, Yiwei and Yang, Tianchi and Zhang, Zihan and Huang, Haizhen and Mei, Lingrui and Fang, Junfeng and Li, Zehao and Wei, Furu and others},
  journal={arXiv preprint arXiv:2412.15280},
  year={2024}
}