RAT

Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

Zihao Wang¹, Anji Liu², Haowei Lin¹, Jiaqi Li³, Xiaojian Ma³, Yitao Liang¹,

¹Peking University, ²University of California, Los Angeles ³Beijing Institute for General Artificial Intelligence

Abstract

We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method — retrieval-augmented thoughts (RAT) — revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated.

Applying RAT to various base models substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning.

Top: An example of different LLM reasoning methods on creative generation tasks. Red text indicates errors or illusions in the text generated by LLM, while green text represents correct generation. Methods without RAG often generate incorrect information with hallucination, classical RAG is highly related to retrieved content with a loose structure, and RAT-generated texts perform best in terms of accuracy and completeness.

Bottom: The quantitative performance comparison for different LLM reasoning methods on complex embodied planning, mathematical reasoning, code generation, and creative generation tasks. Our RAT outperforms all the baselines on all tasks.

Pipeline

Given a task prompt, RAT starts from initial step-by-step thoughts produced by an LLM in zero-shot ("let's think step by step"). Some thought steps may be flawed due to hallucination. RAT interatively revise each thought step using RAG from an external knowledge base (denoted as Library or Internet).
The detailed algorithm is as follows:

Results

Evaluation results on code generation mathematical reasoning, creative writing, and embodied planning tasks. We compare RAT with DIRECT, RAG(1-shot), RAG(5-shot) and Zero-shot CoT. Δ represents the relative improvements than DIRECT.

Conversation Examples

BibTeX

@article{wang2024rat,
      author    = {Zihao, Wang and Anji, Liu and Haowei, Lin and Jiaqi, Li and Xiaojian, Ma and Yitao, Liang},
      title     = {RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation},
      journal   = {arXiv preprint arXiv: 2403.05313},
      year      = {2024},
    }